javakeystore: pre-deploy export snapshot + on-import-failure rollback + argv-password operator note

Closes Bundle 8 of the 2026-05-02 deployment-target coverage audit
(see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix,
DeployCertificate at javakeystore.go:172-272 ran an irreversible
keytool -delete against the existing alias, then keytool
-importkeystore. If the import failed after the delete succeeded,
the keystore was missing the alias entirely — previous cert gone,
new cert never landed. docs/deployment-atomicity.md L94 promised
"keytool snapshot; rollback via keytool -delete + re-import"; the
code didn't deliver. Separately, the operator-facing keystore
password is passed via -storepass argv (a standard keytool
limitation) which is visible to ps(1) for the duration of each
subprocess; this was undocumented as an operator-playbook caveat.

This commit:

1. Pre-delete snapshot. When os.Stat(KeystorePath) succeeds,
   snapshotKeystore runs keytool -exportkeystore to
   <BackupDir>/.certctl-bak.<unix-nanos>.p12 BEFORE the existing
   -delete step. Backup path persisted in a local variable for
   the rollback path; export-step failure aborts the deploy
   entirely (no mutation has happened yet — the keystore is
   untouched). Snapshot skipped on first-time deploys (no
   keystore file = nothing to roll back to). The "alias not
   present in pre-existing keystore" case is recognised via the
   well-known keytool error string and treated as a clean
   first-time-on-existing-keystore signal — the deploy proceeds
   without a backup, and rollback (if needed) becomes the
   no-backup branch.

2. On-import-failure rollback. When keytool -importkeystore
   returns error, rollbackImport(ctx, backupPath) runs:
   - keytool -delete -alias <Alias> ... (best-effort; the failed
     import may have created a partial alias entry).
   - keytool -importkeystore from the backup PKCS#12 to restore
     the previous state.
   On rollback success, the deploy returns wrapped error noting
   "rolled back from <backup_path>". On rollback failure,
   returns operator-actionable wrapped error containing both the
   import error AND the rollback error AND the backup path so
   the operator can manually keytool -importkeystore from the
   .p12 file to recover.

3. Backup retention. Successful deploys prune older
   .certctl-bak.*.p12 files beyond Config.BackupRetention.
   Sort by ModTime newest-first; keep most recent N. Defaults:
   BackupRetention=0  → keep most recent 3 (the default).
   BackupRetention=N  → keep most recent N.
   BackupRetention=-1 → opt out of pruning entirely (operators
                        that wire their own archival/rotation).
   Pruning runs in the success path AFTER the optional reload
   command so it doesn't interfere with deploy-time signals.
   ReadDir / Remove failures are non-fatal (debug log only) —
   the deploy already succeeded.

4. Config gains BackupRetention int and BackupDir string fields.
   BackupDir defaults to filepath.Dir(KeystorePath) so backups
   land on the same filesystem as the keystore (atomic-ish
   writes, disk-full failures fail fast at snapshot time).

5. Helper extraction. snapshotKeystore + rollbackImport +
   pruneBackups + backupDir are private methods on Connector.
   Constants backupFilePrefix=".certctl-bak." and
   backupFileSuffix=".p12" centralise the naming convention so
   the snapshot writer, the rollback reader, and the retention
   pruner all agree.

6. Operator-playbook section added to docs/connectors.md
   JavaKeystore section. Documents the standard keytool
   -storepass argv exposure: ps(1)-visible for the duration
   of each subprocess. Lists mitigations:
   - Restrict shell access to the agent host.
   - Linux user namespaces / AppArmor / SystemD ProtectProc=
     invisible to deny ps-visibility.
   - Single-purpose container for proper PID-namespace
     isolation.
   - Post-deploy keystore password rotation via reload_command
     for high-security environments.
   - BCFKS keystore type for FIPS environments (same argv
     caveat applies).
   Also documents an "Atomic rollback" subsection covering the
   snapshot/rollback flow, the new backup_retention /
   backup_dir Config fields, and the design choice to reuse
   the keystore password for the snapshot (rather than
   generating a separate transient password) — operator
   already trusts the connector with this secret, surface area
   doesn't grow, rollback's matching -srcstorepass stays
   simple.

Tests added to javakeystore_test.go (7 new tests, ~430 LOC):

- TestJKS_Snapshot_RunsBefore_Delete: mock executor records call
  order; asserts -exportkeystore is call[0], -delete is call[1],
  -importkeystore is call[2]. The snapshot MUST run before the
  delete — otherwise the delete destroys the very state the
  snapshot is meant to capture.
- TestJKS_Snapshot_FirstTimeDeploy_NoExport: no keystore file
  pre-created; asserts exactly 1 keytool call (-importkeystore
  only), no -exportkeystore.
- TestJKS_ImportFails_RollsBack: happy rollback path with one
  same-Subject backup. Asserts rollback re-import references the
  same backup path the snapshot wrote (verified via arg
  comparison between call[0] and call[4]).
- TestJKS_ImportFails_RollbackAlsoFails_OperatorActionable:
  wrapped-error escalation with backup path in the error
  message.
- TestJKS_BackupRetention_PrunesOldBackups: 5 pre-existing
  staggered-ModTime backups + 1 deploy-created → retention=3 →
  exactly 3 newest survive (deploy-created + 2 newest
  pre-existing); 3 oldest pre-existing pruned.
- TestJKS_BackupRetention_Zero_DefaultsTo3: BackupRetention=0
  must default to 3 (not "keep none").
- TestJKS_BackupRetention_Negative_OptsOut: BackupRetention=-1
  pre-existing 5 + deploy 1 = 6 total, all 6 remain.
- TestJKS_Snapshot_AliasNotInKeystore_ProceedsCleanly: keystore
  exists but alias missing; -exportkeystore returns "alias does
  not exist" → snapshot helper recognises this signal and
  returns ("", nil) so the deploy proceeds cleanly.

mockExecutor extended with optional `onCall` hook so the
retention-pruning tests can simulate keytool -exportkeystore's
file-write side effect (via the simulateExportSideEffect helper
that parses -destkeystore from args and writes a placeholder
.p12 file). Existing tests that don't set onCall behave
identically to before — backward compatible.

docs/deployment-atomicity.md L94 unchanged from today's text —
Bundle 1 doc-realignment hasn't shipped, so the "keytool snapshot;
rollback via keytool -delete + re-import" line was never softened.
Post-Bundle-8 the claim is honest (was aspirational pre-fix).

Verified locally (sandbox lacks staticcheck install due to disk
pressure; CI runs the full lint gate):
- gofmt -l ./internal/connector/target/javakeystore/ clean
- go vet ./internal/connector/target/javakeystore/ clean
- go build ./cmd/agent/... clean
- go test -race -count=1 ./internal/connector/target/javakeystore/
  green (16 tests total: 9 pre-existing + 7 new)

Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md
Bundle 8.
This commit is contained in:
shankar0123
2026-05-02 19:01:06 +00:00
parent 1dd1dd4e0a
commit 87e0009d97
3 changed files with 887 additions and 2 deletions
+33
View File
@@ -1159,6 +1159,8 @@ The Java Keystore connector deploys certificates to JKS or PKCS#12 keystores via
| `reload_command` | string | | Optional command to run after keystore update |
| `create_keystore` | boolean | `true` | Create keystore if it doesn't exist |
| `keytool_path` | string | `"keytool"` | Override keytool binary path |
| `backup_retention` | int | `3` | Number of `.certctl-bak.<unix-nanos>.p12` snapshot files to keep after a successful deploy. `0` means use the default of 3; `-1` opts out of pruning entirely. |
| `backup_dir` | string | `dirname(keystore_path)` | Override directory where rollback snapshots are written and pruned from. Defaults to the keystore's own directory so snapshots land on the same filesystem. |
**Security:**
- Reload commands validated against shell injection via `validation.ValidateShellCommand()`
@@ -1166,6 +1168,37 @@ The Java Keystore connector deploys certificates to JKS or PKCS#12 keystores via
- Path traversal prevention on keystore path
- Transient PKCS#12 temp file cleaned up after import (even on error)
**Atomic rollback (Bundle 8 of the 2026-05-02 deployment-target audit):**
The deploy flow is **snapshot → delete → import → reload**. Before the irreversible `keytool -delete` step (which removes the existing alias from the keystore), the connector runs `keytool -exportkeystore` to write a sibling `.certctl-bak.<unix-nanos>.p12` file containing the prior alias. If the subsequent `keytool -importkeystore` fails for any reason, the rollback path runs `keytool -delete` (best-effort cleanup of any partial alias the failed import created) followed by `keytool -importkeystore` from the snapshot PFX, restoring the keystore to its pre-deploy state. If both the import AND the rollback fail, the connector returns an operator-actionable wrapped error containing both error strings AND the snapshot path so the operator can manually `keytool -importkeystore` from the `.p12` file to recover.
Successful deploys prune older `.certctl-bak.*.p12` files beyond the configured `backup_retention` count; pruning sorts by file ModTime and removes the oldest entries first. Operators that wire their own archival/rotation logic can opt out via `backup_retention: -1`.
First-time deploys (no keystore file exists at the configured path) skip the snapshot phase entirely — there's nothing to roll back to. The same is true for "alias-not-present-in-existing-keystore" deploys: `keytool -exportkeystore` returns "alias does not exist" which the connector recognises as a normal first-time-on-existing-keystore signal, not an outage.
### Operator playbook: keytool argv password exposure
Java's `keytool` accepts the keystore password via the `-storepass` argv flag — there is no stdin or file-based password mode in OpenJDK keytool. While the keytool subprocess is running, the password is visible in `ps(1)` output to any user on the same host who can read `/proc/<pid>/cmdline`. This is a **standard keytool limitation, not a certctl-specific issue**, but operators in regulated environments should know about it before deploying certctl on shared hosts.
**What this means in practice:**
- The password is visible for the duration of each keytool invocation (typically <1s on modern hardware; the connector runs 2-4 keytool calls per deploy: snapshot, optional pre-import delete, import, optional rollback).
- A local user with shell access on the agent host who polls `ps -ef` aggressively can capture the password.
- The exposure is local to the agent host; remote attackers without shell access cannot see it.
- The same applies to the snapshot's transient `-deststorepass` (which mirrors the operator's keystore password by design — see "Why the snapshot reuses the keystore password" below).
**Mitigations** (layer one or more depending on threat model):
- **Restrict shell access to the agent host.** Only the certctl agent's service account should have a login shell. Other admins SSH to a bastion that doesn't host the agent.
- **Use Linux user namespaces or AppArmor** to deny `ps`-visibility into the keytool subprocess for non-root users. SystemD's `ProtectKernelTunables=yes` + `ProtectProc=invisible` (kernel 5.8+) hides `/proc/<pid>` from non-owner users.
- **Run the certctl agent in a single-purpose container** so only the agent's processes are visible to anyone who execs into the container. The host's `ps` doesn't see container internals if proper PID-namespace isolation is configured.
- **Rotate the keystore password post-deployment.** For high-security environments where the brief exposure is unacceptable, the rotation can itself be automated via a post-deploy hook running `keytool -storepasswd`. The certctl `reload_command` is the natural place for this; just be aware the new password must be propagated to whatever service reads the keystore (Tomcat's `server.xml`, Kafka's `kafka.properties`, etc.).
- **For FIPS environments**, use the `BCFKS` (BouncyCastle FIPS) keystore type which supports stronger password-derivation. Same argv-exposure caveat applies; the keystore-format change doesn't affect how keytool receives the password.
For a fundamentally different password-handling model, switch to a non-Java target (e.g. PEM-on-disk via the SSH connector + a JCA-shim like `tomcat-native` reading PEMs directly) or a PKCS#11 keystore (where the password is supplied to the cryptoki library, not via argv).
**Why the snapshot reuses the keystore password.** The snapshot's `keytool -exportkeystore` writes a PKCS#12 file under a `-deststorepass`. The connector reuses the operator's `keystore_password` for this rather than generating a separate transient password. Two reasons: (a) the operator already trusts the connector with this secret, so the surface area doesn't grow; (b) the rollback's matching `keytool -importkeystore` needs to know the password too, and threading a second random password through the in-memory state machine adds complexity (and another argv-exposure window) for no security gain. If you rotate the keystore password between deploys, the rollback may fail to read the snapshot — keep stale `.certctl-bak.*.p12` files on disk until the rotation completes, and clean them up manually if rotation invalidates them.
Location: `internal/connector/target/javakeystore/javakeystore.go`
### Kubernetes Secrets
@@ -17,6 +17,7 @@ import (
"os/exec"
"path/filepath"
"regexp"
"sort"
"strings"
"time"
@@ -49,6 +50,23 @@ type Config struct {
// KeytoolPath overrides the default keytool binary path.
// Default: "keytool" (found via PATH).
KeytoolPath string `json:"keytool_path,omitempty"`
// BackupRetention controls how many .certctl-bak.<unix-nanos>.p12 backup
// files to keep after a successful deploy. Bundle 8 (2026-05-02
// deployment-target audit) introduced these backups for on-import-failure
// rollback; without retention, every deploy adds another file and disks
// fill up over time. Values:
// 0 → use default of 3 (keep most recent 3 backups).
// N → keep most recent N backups.
// -1 → opt out of pruning entirely (operators that wire their own
// archival/rotation logic).
BackupRetention int `json:"backup_retention,omitempty"`
// BackupDir overrides the directory where .certctl-bak.* files are
// written and pruned from. Default: filepath.Dir(KeystorePath) — same
// filesystem as the keystore itself, so backup writes are atomic-ish
// and a full disk fails fast at snapshot time rather than mid-deploy.
BackupDir string `json:"backup_dir,omitempty"`
}
// CommandExecutor abstracts command execution for testability.
@@ -168,7 +186,21 @@ func (c *Connector) ValidateConfig(ctx context.Context, config json.RawMessage)
}
// DeployCertificate imports a certificate and key into the Java Keystore.
// Flow: PEM → PKCS#12 temp file → keytool -importkeystore → cleanup temp → optional reload
//
// Bundle 8 of the 2026-05-02 deployment-target audit added a pre-delete
// snapshot + on-import-failure rollback wrapper around the original
// keytool flow:
// 1. Convert PEM to PKCS#12 temp file (transient password, never logged).
// 2. If the keystore exists, run `keytool -exportkeystore` to a sibling
// `.certctl-bak.<unix-nanos>.p12` BEFORE the irreversible -delete.
// Backup path persisted in a local variable for the rollback path.
// 3. Run the existing -delete (best-effort; alias may not exist).
// 4. Run keytool -importkeystore.
// 5. On import failure with a backup in hand, rollbackImport runs
// keytool -delete (clean up the alias the failed import may have
// created) + keytool -importkeystore from the backup PFX.
// 6. On success: compute thumbprint, run optional reload command,
// prune old backup files per Config.BackupRetention.
func (c *Connector) DeployCertificate(ctx context.Context, request target.DeploymentRequest) (*target.DeploymentResult, error) {
if request.KeyPEM == "" {
return nil, fmt.Errorf("private key is required for Java Keystore import")
@@ -204,8 +236,27 @@ func (c *Connector) DeployCertificate(ctx context.Context, request target.Deploy
}
tmpFile.Close()
// Step 2: Delete existing alias if keystore exists (keytool -delete)
// Bundle 8: pre-delete snapshot. When the keystore exists, run
// keytool -exportkeystore to capture the prior alias state into a
// sibling PKCS#12 backup file BEFORE the irreversible -delete step.
// Backup path is held in a local variable for the rollback path;
// snapshot failure aborts the deploy entirely (no mutation has
// happened yet, so the keystore is untouched).
//
// Empty backupPath = first-time deploy (keystore file doesn't exist
// yet) — rollback in that case has nothing to restore from; the
// failure path returns the import error verbatim.
var backupPath string
if _, err := os.Stat(c.config.KeystorePath); err == nil {
var snapErr error
backupPath, snapErr = c.snapshotKeystore(ctx)
if snapErr != nil {
return nil, fmt.Errorf("pre-deploy snapshot failed: %w", snapErr)
}
c.logger.Debug("pre-deploy snapshot captured", "backup_path", backupPath)
// Step 2: Delete existing alias (keytool -delete). Best-effort —
// the alias may not exist in this keystore.
deleteArgs := []string{
"-delete",
"-alias", c.config.Alias,
@@ -234,6 +285,30 @@ func (c *Connector) DeployCertificate(ctx context.Context, request target.Deploy
output, err := c.executor.Execute(ctx, c.config.KeytoolPath, importArgs...)
if err != nil {
// Bundle 8: import failed. Roll back if we have a backup; otherwise
// surface the import error verbatim (first-time deploy — nothing
// to restore from, the failed import didn't write anything we can
// undo at the alias level).
if backupPath != "" {
c.logger.Error("keytool import failed; attempting rollback",
"error", err,
"output", output,
"backup_path", backupPath)
rbErr := c.rollbackImport(ctx, backupPath)
if rbErr != nil {
// Operator-actionable: import AND rollback both failed.
// Surface BOTH errors AND the backup path so the operator
// can manually keytool -importkeystore from the .p12 file
// to recover.
combined := fmt.Errorf("keytool import failed (%w) AND rollback also failed (%v); manual operator inspection required (backup at %s)", err, rbErr, backupPath)
c.logger.Error("JavaKeystore rollback also failed",
"import_error", err,
"rollback_error", rbErr,
"backup_path", backupPath)
return nil, combined
}
return nil, fmt.Errorf("keytool import failed; rolled back from %s: %s: %w", backupPath, output, err)
}
return nil, fmt.Errorf("keytool import failed: %s: %w", output, err)
}
@@ -251,6 +326,12 @@ func (c *Connector) DeployCertificate(ctx context.Context, request target.Deploy
}
}
// Bundle 8: prune old backups on the success path so operator filesystems
// don't accumulate .certctl-bak.* files indefinitely. Failure here is
// non-fatal (debug log only) — the deploy succeeded, retention cleanup
// is housekeeping.
c.pruneBackups()
c.logger.Info("certificate imported to Java Keystore",
"keystore", c.config.KeystorePath,
"alias", c.config.Alias,
@@ -325,3 +406,186 @@ func (c *Connector) ValidateDeployment(ctx context.Context, request target.Valid
// Ensure Connector implements target.Connector.
var _ target.Connector = (*Connector)(nil)
// --- Bundle 8: pre-delete snapshot + on-import-failure rollback ---
// backupFilePrefix is the literal prefix on rollback-snapshot files.
// Centralised here so the snapshot writer, the rollback reader, and the
// retention pruner all agree on the naming convention.
//
// Bundle 8 of the 2026-05-02 deployment-target audit.
const backupFilePrefix = ".certctl-bak."
// backupFileSuffix is the literal suffix on rollback-snapshot files. Always
// PKCS#12 regardless of the source keystore type — `keytool -exportkeystore`
// destinations are PKCS#12 by convention because every JVM can read PKCS#12,
// while JKS is OpenJDK-specific.
const backupFileSuffix = ".p12"
// backupDir returns the directory rollback snapshots are written to.
// Operators can override via Config.BackupDir; default = same dir as the
// keystore so snapshots land on the same filesystem (atomic-ish writes,
// disk-full failures surface at snapshot time rather than mid-deploy).
func (c *Connector) backupDir() string {
if c.config.BackupDir != "" {
return c.config.BackupDir
}
return filepath.Dir(c.config.KeystorePath)
}
// snapshotKeystore runs `keytool -exportkeystore` to copy the existing alias
// into a new PKCS#12 file at <backupDir>/.certctl-bak.<unix-nanos>.p12.
// Returns the backup path on success; the caller persists it for the
// rollback path.
//
// The export password mirrors the keystore password — it's the same secret
// the operator already trusts the connector with, and avoiding a second
// transient password keeps the rollback's matching `-srcstorepass` simple.
//
// Bundle 8 of the 2026-05-02 deployment-target audit.
func (c *Connector) snapshotKeystore(ctx context.Context) (string, error) {
backupPath := filepath.Join(
c.backupDir(),
fmt.Sprintf("%s%d%s", backupFilePrefix, time.Now().UnixNano(), backupFileSuffix),
)
exportArgs := []string{
"-exportkeystore",
"-srckeystore", c.config.KeystorePath,
"-srcstoretype", c.config.KeystoreType,
"-srcstorepass", c.config.KeystorePassword,
"-srcalias", c.config.Alias,
"-destkeystore", backupPath,
"-deststoretype", "PKCS12",
"-deststorepass", c.config.KeystorePassword,
"-noprompt",
}
output, err := c.executor.Execute(ctx, c.config.KeytoolPath, exportArgs...)
if err != nil {
// keytool -exportkeystore returns non-zero when the alias isn't
// present in the source keystore. That's a normal first-time-on-
// existing-keystore signal, NOT an outage. Treat it as "no
// snapshot to roll back to" and proceed cleanly — the import
// will create the alias from scratch, and rollback (if the
// import then fails) will be the no-backup path.
lowerOut := strings.ToLower(output)
if strings.Contains(lowerOut, "does not exist") || strings.Contains(lowerOut, "alias <") {
c.logger.Debug("snapshot found no existing alias to export — first-time-on-keystore deploy",
"alias", c.config.Alias,
"output", output)
return "", nil
}
return "", fmt.Errorf("keytool -exportkeystore: %s: %w", output, err)
}
return backupPath, nil
}
// rollbackImport restores the previous alias state from a snapshot PFX. Two
// keytool calls in order:
// 1. -delete the alias (best-effort — the failed import may or may not have
// created an alias entry; we don't know which, so we always try).
// 2. -importkeystore from the backup PFX, restoring the original cert + key
// under the original alias.
//
// Returns nil on success; wrapped error on rollback-script failure. The
// caller surfaces the wrapped error to the operator alongside the import
// error and the backup path so manual recovery is possible.
//
// Bundle 8 of the 2026-05-02 deployment-target audit.
func (c *Connector) rollbackImport(ctx context.Context, backupPath string) error {
// Step 1: best-effort delete (alias may not exist after a failed import).
deleteArgs := []string{
"-delete",
"-alias", c.config.Alias,
"-keystore", c.config.KeystorePath,
"-storepass", c.config.KeystorePassword,
"-storetype", c.config.KeystoreType,
"-noprompt",
}
c.executor.Execute(ctx, c.config.KeytoolPath, deleteArgs...)
// Step 2: re-import from the backup PKCS#12 to restore the previous state.
importArgs := []string{
"-importkeystore",
"-srckeystore", backupPath,
"-srcstoretype", "PKCS12",
"-srcstorepass", c.config.KeystorePassword,
"-destkeystore", c.config.KeystorePath,
"-deststoretype", c.config.KeystoreType,
"-deststorepass", c.config.KeystorePassword,
"-srcalias", c.config.Alias,
"-destalias", c.config.Alias,
"-noprompt",
}
output, err := c.executor.Execute(ctx, c.config.KeytoolPath, importArgs...)
if err != nil {
return fmt.Errorf("rollback re-import: %s: %w", output, err)
}
c.logger.Info("JavaKeystore rollback completed", "backup_path", backupPath)
return nil
}
// pruneBackups removes older `.certctl-bak.*.p12` files beyond the configured
// retention count so operator filesystems don't accumulate snapshots
// indefinitely. Best-effort: any error during the readdir / remove cycle
// is swallowed at debug level — the deploy already succeeded, retention
// cleanup is housekeeping.
//
// Retention semantics (per Config.BackupRetention):
// - 0 → default of 3 (keep most recent 3 backups).
// - N → keep most recent N backups.
// - -1 → opt out entirely (no pruning).
//
// "Most recent" is determined by file ModTime, not by the unix-nanos in the
// filename — ModTime is robust against system-clock changes between deploys
// and aligns with the actual filesystem ordering operators see in `ls -lt`.
//
// Bundle 8 of the 2026-05-02 deployment-target audit.
func (c *Connector) pruneBackups() {
keep := c.config.BackupRetention
if keep == 0 {
keep = 3
}
if keep < 0 {
return // operator opted out
}
dir := c.backupDir()
entries, err := os.ReadDir(dir)
if err != nil {
c.logger.Debug("backup retention prune skipped: ReadDir failed",
"dir", dir, "error", err)
return
}
type backupFile struct {
name string
modTime time.Time
}
var backups []backupFile
for _, e := range entries {
if e.IsDir() {
continue
}
name := e.Name()
if !strings.HasPrefix(name, backupFilePrefix) || !strings.HasSuffix(name, backupFileSuffix) {
continue
}
info, err := e.Info()
if err != nil {
continue
}
backups = append(backups, backupFile{name: name, modTime: info.ModTime()})
}
if len(backups) <= keep {
return
}
// Sort newest-first by ModTime; older entries (the tail) get pruned.
sort.Slice(backups, func(i, j int) bool {
return backups[i].modTime.After(backups[j].modTime)
})
for _, b := range backups[keep:] {
path := filepath.Join(dir, b.name)
if err := os.Remove(path); err != nil {
c.logger.Debug("backup retention prune: Remove failed",
"path", path, "error", err)
}
}
}
@@ -25,10 +25,17 @@ func testLogger() *slog.Logger {
}
// mockExecutor records commands and returns configurable responses.
//
// Bundle 8 (2026-05-02 deployment-target audit) added the optional
// `onCall` hook so retention-pruning tests can simulate keytool's
// file-write side effects (e.g. -exportkeystore writes a .p12 to the
// -destkeystore path). Existing tests that don't set onCall behave
// identically to before.
type mockExecutor struct {
calls []mockCall
responses []mockResponse
callIndex int
onCall func(name string, args []string)
}
type mockCall struct {
@@ -43,6 +50,9 @@ type mockResponse struct {
func (m *mockExecutor) Execute(ctx context.Context, name string, args ...string) (string, error) {
m.calls = append(m.calls, mockCall{Name: name, Args: args})
if m.onCall != nil {
m.onCall(name, args)
}
idx := m.callIndex
m.callIndex++
if idx < len(m.responses) {
@@ -51,6 +61,40 @@ func (m *mockExecutor) Execute(ctx context.Context, name string, args ...string)
return "", nil
}
// simulateExportSideEffect returns an onCall handler that mimics what real
// keytool -exportkeystore does: writes a small placeholder file at the
// path passed via -destkeystore. Used by Bundle 8 retention-pruning tests
// where the deploy-created backup file needs to actually exist on disk
// for the pruner's ReadDir to find it.
func simulateExportSideEffect(t *testing.T) func(name string, args []string) {
t.Helper()
return func(name string, args []string) {
isExport := false
for _, a := range args {
if a == "-exportkeystore" {
isExport = true
break
}
}
if !isExport {
return
}
var dest string
for i, a := range args {
if a == "-destkeystore" && i+1 < len(args) {
dest = args[i+1]
break
}
}
if dest == "" {
return
}
if err := os.WriteFile(dest, []byte("simulated-backup-pkcs12"), 0644); err != nil {
t.Logf("simulateExportSideEffect: write %s failed: %v", dest, err)
}
}
}
// generateTestCertAndKey creates a self-signed certificate and key for testing.
func generateTestCertAndKey() (string, string, error) {
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
@@ -529,3 +573,547 @@ func TestValidateDeployment_SerialMismatch(t *testing.T) {
t.Error("expected serial_match=false")
}
}
// --- Bundle 8: pre-delete snapshot + on-import-failure rollback ---
//
// These seven tests pin the load-bearing rollback contract added in
// Bundle 8 of the 2026-05-02 deployment-target audit:
// - snapshot order (export runs BEFORE delete BEFORE import);
// - first-time deploy skips the snapshot (no keystore file = nothing
// to roll back to, so no -exportkeystore call);
// - happy rollback path (import fails → rollback re-imports from the
// backup PFX);
// - rollback-also-fails (operator-actionable wrapped error containing
// both errors AND the backup path for manual recovery);
// - retention pruning (5 pre-existing → 3 newest kept after deploy);
// - retention zero defaults to 3;
// - retention negative opts out of pruning entirely.
func TestJKS_Snapshot_RunsBefore_Delete(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
tmpDir := t.TempDir()
keystorePath := tmpDir + "/app.p12"
// Pre-create the keystore file so the snapshot phase fires (the
// snapshot is gated on os.Stat returning nil for the keystore path).
if err := os.WriteFile(keystorePath, []byte("fake-existing-keystore"), 0644); err != nil {
t.Fatal(err)
}
mock := &mockExecutor{
responses: []mockResponse{
{Output: "Imported keystore for alias <server>", Err: nil}, // -exportkeystore (snapshot)
{Output: "", Err: nil}, // -delete (alias may exist)
{Output: "Import command completed", Err: nil}, // -importkeystore (the actual deploy)
},
}
c := NewWithExecutor(&Config{
KeystorePath: keystorePath,
KeystorePassword: "changeit",
KeystoreType: "PKCS12",
Alias: "server",
}, testLogger(), mock)
result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err != nil {
t.Fatalf("deploy failed: %v", err)
}
if !result.Success {
t.Error("expected success=true")
}
// 3 keytool calls: export (snapshot) → delete → import. The order is
// load-bearing: snapshot MUST run before delete, otherwise the delete
// destroys the very state the snapshot is meant to capture.
if len(mock.calls) != 3 {
t.Fatalf("expected 3 keytool calls (export + delete + import), got %d", len(mock.calls))
}
// Call 0: -exportkeystore.
if mock.calls[0].Name != "keytool" {
t.Errorf("call 0: expected keytool, got %s", mock.calls[0].Name)
}
args0 := strings.Join(mock.calls[0].Args, " ")
if !strings.Contains(args0, "-exportkeystore") {
t.Errorf("call 0: expected -exportkeystore in args, got: %s", args0)
}
if !strings.Contains(args0, "-srckeystore "+keystorePath) {
t.Errorf("call 0: expected -srckeystore %s, got: %s", keystorePath, args0)
}
// Backup path: <tmpDir>/.certctl-bak.<unix-nanos>.p12
if !strings.Contains(args0, ".certctl-bak.") || !strings.Contains(args0, ".p12") {
t.Errorf("call 0: expected .certctl-bak.*.p12 backup path, got: %s", args0)
}
// Call 1: -delete.
args1 := strings.Join(mock.calls[1].Args, " ")
if !strings.Contains(args1, "-delete") {
t.Errorf("call 1: expected -delete in args, got: %s", args1)
}
// Call 2: -importkeystore (the deploy itself).
args2 := strings.Join(mock.calls[2].Args, " ")
if !strings.Contains(args2, "-importkeystore") {
t.Errorf("call 2: expected -importkeystore in args, got: %s", args2)
}
if !strings.Contains(args2, "-destkeystore "+keystorePath) {
t.Errorf("call 2: expected -destkeystore %s, got: %s", keystorePath, args2)
}
}
func TestJKS_Snapshot_FirstTimeDeploy_NoExport(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
tmpDir := t.TempDir()
// NO keystore file pre-created — first-time deploy. Snapshot phase
// gated on os.Stat returning nil; with no file, the snapshot is
// skipped, the -delete is skipped, only the -importkeystore runs.
keystorePath := tmpDir + "/app.p12"
mock := &mockExecutor{
responses: []mockResponse{
{Output: "Import command completed", Err: nil}, // -importkeystore only
},
}
c := NewWithExecutor(&Config{
KeystorePath: keystorePath,
KeystorePassword: "changeit",
KeystoreType: "PKCS12",
Alias: "server",
}, testLogger(), mock)
result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err != nil {
t.Fatalf("deploy failed: %v", err)
}
if !result.Success {
t.Error("expected success=true")
}
// Exactly 1 call: -importkeystore. No -exportkeystore (no keystore
// file existed pre-deploy), no -delete (same reason).
if len(mock.calls) != 1 {
t.Fatalf("expected 1 keytool call (import only), got %d: %v", len(mock.calls), mock.calls)
}
args := strings.Join(mock.calls[0].Args, " ")
if strings.Contains(args, "-exportkeystore") {
t.Errorf("expected no -exportkeystore on first-time deploy, got: %s", args)
}
if !strings.Contains(args, "-importkeystore") {
t.Errorf("expected -importkeystore in args, got: %s", args)
}
}
func TestJKS_ImportFails_RollsBack(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
tmpDir := t.TempDir()
keystorePath := tmpDir + "/app.p12"
if err := os.WriteFile(keystorePath, []byte("fake-existing-keystore"), 0644); err != nil {
t.Fatal(err)
}
// Snapshot succeeds → delete succeeds → import fails → rollback runs:
// rollback delete (best-effort) + rollback re-import from backup PFX.
mock := &mockExecutor{
responses: []mockResponse{
{Output: "Imported keystore for alias <server>", Err: nil}, // 0: -exportkeystore (snapshot)
{Output: "", Err: nil}, // 1: -delete (pre-import)
{Output: "keystore corruption error", Err: fmt.Errorf("exit 1")}, // 2: -importkeystore FAILS
{Output: "", Err: nil}, // 3: -delete (rollback step 1)
{Output: "Imported keystore for alias <server>", Err: nil}, // 4: -importkeystore (rollback step 2)
},
}
c := NewWithExecutor(&Config{
KeystorePath: keystorePath,
KeystorePassword: "changeit",
KeystoreType: "PKCS12",
Alias: "server",
}, testLogger(), mock)
_, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err == nil {
t.Fatal("expected error when import fails")
}
// Wrapped error must surface BOTH the import failure AND the rollback
// success ("rolled back from <backupPath>") so operators know they
// don't need to manually recover.
if !strings.Contains(err.Error(), "keytool import failed") {
t.Errorf("expected error to mention import failure, got: %v", err)
}
if !strings.Contains(err.Error(), "rolled back from") {
t.Errorf("expected error to mention rollback from <backup>, got: %v", err)
}
// 5 keytool calls: export, delete, import-fail, rollback-delete,
// rollback-import. Locate the rollback re-import call (call 4) and
// assert it references the backup path.
if len(mock.calls) != 5 {
t.Fatalf("expected 5 keytool calls (export, delete, import, rollback-delete, rollback-import), got %d", len(mock.calls))
}
rollbackImportArgs := strings.Join(mock.calls[4].Args, " ")
if !strings.Contains(rollbackImportArgs, "-importkeystore") {
t.Errorf("call 4: expected -importkeystore for rollback, got: %s", rollbackImportArgs)
}
if !strings.Contains(rollbackImportArgs, ".certctl-bak.") {
t.Errorf("call 4: expected backup path (.certctl-bak.*) as -srckeystore, got: %s", rollbackImportArgs)
}
// The same backup path that the snapshot wrote should be the source
// for the rollback re-import — verify by extracting both and comparing.
exportArgs := strings.Join(mock.calls[0].Args, " ")
for _, arg := range mock.calls[0].Args {
if strings.Contains(arg, ".certctl-bak.") && strings.HasSuffix(arg, ".p12") {
if !strings.Contains(rollbackImportArgs, arg) {
t.Errorf("rollback re-import did not reference snapshot backup %q\n export args: %s\n rollback args: %s", arg, exportArgs, rollbackImportArgs)
}
break
}
}
}
func TestJKS_ImportFails_RollbackAlsoFails_OperatorActionable(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
tmpDir := t.TempDir()
keystorePath := tmpDir + "/app.p12"
if err := os.WriteFile(keystorePath, []byte("fake-existing-keystore"), 0644); err != nil {
t.Fatal(err)
}
// Snapshot → delete → import-fail → rollback-delete → rollback-import
// ALSO fails. Operator-actionable case: BOTH errors AND the backup
// path must be in the wrapped error so operators can manually recover
// from the .p12 file on disk.
mock := &mockExecutor{
responses: []mockResponse{
{Output: "Imported keystore for alias <server>", Err: nil}, // 0: snapshot
{Output: "", Err: nil}, // 1: pre-import delete
{Output: "import-step error", Err: fmt.Errorf("import exit 1")}, // 2: import FAILS
{Output: "", Err: nil}, // 3: rollback delete
{Output: "rollback-step error", Err: fmt.Errorf("rollback exit 2")}, // 4: rollback import FAILS
},
}
c := NewWithExecutor(&Config{
KeystorePath: keystorePath,
KeystorePassword: "changeit",
KeystoreType: "PKCS12",
Alias: "server",
}, testLogger(), mock)
_, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err == nil {
t.Fatal("expected error when both import and rollback fail")
}
// Wrapped error must mention BOTH errors AND the backup path so the
// operator can manually recover.
if !strings.Contains(err.Error(), "keytool import failed") {
t.Errorf("expected error to mention import failure, got: %v", err)
}
if !strings.Contains(err.Error(), "rollback also failed") {
t.Errorf("expected error to mention rollback failure, got: %v", err)
}
if !strings.Contains(err.Error(), "manual operator inspection required") {
t.Errorf("expected error to flag manual inspection, got: %v", err)
}
if !strings.Contains(err.Error(), ".certctl-bak.") {
t.Errorf("expected error to surface the backup path so operator can recover manually, got: %v", err)
}
}
func TestJKS_BackupRetention_PrunesOldBackups(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
tmpDir := t.TempDir()
keystorePath := tmpDir + "/app.p12"
if err := os.WriteFile(keystorePath, []byte("fake-keystore"), 0644); err != nil {
t.Fatal(err)
}
// Pre-create 5 backup files with staggered ModTimes so the pruner
// has a deterministic newest-first ordering. The deploy will create
// a 6th; with retention=3, pruning keeps the 3 newest (which are
// the deploy-created backup + the two newest pre-existing).
preExistingNames := []string{
".certctl-bak.100000000.p12", // oldest
".certctl-bak.200000000.p12",
".certctl-bak.300000000.p12",
".certctl-bak.400000000.p12",
".certctl-bak.500000000.p12", // newest pre-existing
}
baseTime := time.Now().Add(-24 * time.Hour)
for i, name := range preExistingNames {
path := tmpDir + "/" + name
if err := os.WriteFile(path, []byte("backup"), 0644); err != nil {
t.Fatal(err)
}
// Stagger ModTime: oldest = baseTime, newest = baseTime + 4 hours.
modTime := baseTime.Add(time.Duration(i) * time.Hour)
if err := os.Chtimes(path, modTime, modTime); err != nil {
t.Fatal(err)
}
}
mock := &mockExecutor{
responses: []mockResponse{
{Output: "Imported keystore for alias <server>", Err: nil}, // export
{Output: "", Err: nil}, // delete
{Output: "Import command completed", Err: nil}, // import
},
onCall: simulateExportSideEffect(t),
}
c := NewWithExecutor(&Config{
KeystorePath: keystorePath,
KeystorePassword: "changeit",
KeystoreType: "PKCS12",
Alias: "server",
BackupRetention: 3,
}, testLogger(), mock)
_, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err != nil {
t.Fatalf("deploy failed: %v", err)
}
// Count remaining .certctl-bak.*.p12 files. Should be exactly 3:
// - the newest pre-existing (500000000) — survives
// - the second-newest pre-existing (400000000) — survives
// - the deploy-created backup — survives (just-now ModTime is the
// newest of all)
// The 3 oldest pre-existing (300000000, 200000000, 100000000) are
// pruned.
entries, err := os.ReadDir(tmpDir)
if err != nil {
t.Fatal(err)
}
var remaining []string
for _, e := range entries {
name := e.Name()
if strings.HasPrefix(name, ".certctl-bak.") && strings.HasSuffix(name, ".p12") {
remaining = append(remaining, name)
}
}
if len(remaining) != 3 {
t.Errorf("expected exactly 3 backups after pruning (BackupRetention=3), got %d: %v", len(remaining), remaining)
}
// The two newest pre-existing must survive.
for _, want := range []string{".certctl-bak.500000000.p12", ".certctl-bak.400000000.p12"} {
found := false
for _, got := range remaining {
if got == want {
found = true
break
}
}
if !found {
t.Errorf("expected %s to survive pruning, got remaining: %v", want, remaining)
}
}
// The three oldest pre-existing must be pruned.
for _, gone := range []string{".certctl-bak.100000000.p12", ".certctl-bak.200000000.p12", ".certctl-bak.300000000.p12"} {
for _, got := range remaining {
if got == gone {
t.Errorf("expected %s to be pruned, but it remained: %v", gone, remaining)
}
}
}
}
func TestJKS_BackupRetention_Zero_DefaultsTo3(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
tmpDir := t.TempDir()
keystorePath := tmpDir + "/app.p12"
if err := os.WriteFile(keystorePath, []byte("fake-keystore"), 0644); err != nil {
t.Fatal(err)
}
// Pre-create 5 staggered backups; with retention=0 (which defaults
// to 3), the pruner should behave identically to the explicit-3 test.
baseTime := time.Now().Add(-24 * time.Hour)
for i := 0; i < 5; i++ {
path := tmpDir + "/.certctl-bak." + fmt.Sprintf("%d", (i+1)*100000000) + ".p12"
if err := os.WriteFile(path, []byte("backup"), 0644); err != nil {
t.Fatal(err)
}
modTime := baseTime.Add(time.Duration(i) * time.Hour)
if err := os.Chtimes(path, modTime, modTime); err != nil {
t.Fatal(err)
}
}
mock := &mockExecutor{
responses: []mockResponse{
{Output: "Imported keystore for alias <server>", Err: nil},
{Output: "", Err: nil},
{Output: "Import command completed", Err: nil},
},
onCall: simulateExportSideEffect(t),
}
c := NewWithExecutor(&Config{
KeystorePath: keystorePath,
KeystorePassword: "changeit",
KeystoreType: "PKCS12",
Alias: "server",
BackupRetention: 0, // explicit zero — must default to 3
}, testLogger(), mock)
_, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err != nil {
t.Fatalf("deploy failed: %v", err)
}
entries, err := os.ReadDir(tmpDir)
if err != nil {
t.Fatal(err)
}
count := 0
for _, e := range entries {
if strings.HasPrefix(e.Name(), ".certctl-bak.") && strings.HasSuffix(e.Name(), ".p12") {
count++
}
}
if count != 3 {
t.Errorf("expected 3 backups after pruning (BackupRetention=0 → default 3), got %d", count)
}
}
func TestJKS_BackupRetention_Negative_OptsOut(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
tmpDir := t.TempDir()
keystorePath := tmpDir + "/app.p12"
if err := os.WriteFile(keystorePath, []byte("fake-keystore"), 0644); err != nil {
t.Fatal(err)
}
// Pre-create 5 backups; with retention=-1, NONE are pruned. After the
// deploy creates a 6th, all 6 remain.
baseTime := time.Now().Add(-24 * time.Hour)
for i := 0; i < 5; i++ {
path := tmpDir + "/.certctl-bak." + fmt.Sprintf("%d", (i+1)*100000000) + ".p12"
if err := os.WriteFile(path, []byte("backup"), 0644); err != nil {
t.Fatal(err)
}
modTime := baseTime.Add(time.Duration(i) * time.Hour)
if err := os.Chtimes(path, modTime, modTime); err != nil {
t.Fatal(err)
}
}
mock := &mockExecutor{
responses: []mockResponse{
{Output: "Imported keystore for alias <server>", Err: nil},
{Output: "", Err: nil},
{Output: "Import command completed", Err: nil},
},
onCall: simulateExportSideEffect(t),
}
c := NewWithExecutor(&Config{
KeystorePath: keystorePath,
KeystorePassword: "changeit",
KeystoreType: "PKCS12",
Alias: "server",
BackupRetention: -1, // opt out
}, testLogger(), mock)
_, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err != nil {
t.Fatalf("deploy failed: %v", err)
}
entries, err := os.ReadDir(tmpDir)
if err != nil {
t.Fatal(err)
}
count := 0
for _, e := range entries {
if strings.HasPrefix(e.Name(), ".certctl-bak.") && strings.HasSuffix(e.Name(), ".p12") {
count++
}
}
// 5 pre-existing + 1 deploy-created = 6; retention=-1 means no pruning.
if count != 6 {
t.Errorf("expected 6 backups after deploy with BackupRetention=-1, got %d", count)
}
}
func TestJKS_Snapshot_AliasNotInKeystore_ProceedsCleanly(t *testing.T) {
// Edge case: keystore file exists but the configured alias isn't
// present in it. keytool -exportkeystore returns non-zero with
// "alias <X> does not exist" — the snapshot helper recognises this
// as a normal first-time-on-existing-keystore signal and returns
// ("", nil), letting the deploy proceed without a backup.
// The subsequent import-failure path then becomes the no-backup
// branch (returns the import error verbatim).
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
tmpDir := t.TempDir()
keystorePath := tmpDir + "/app.p12"
if err := os.WriteFile(keystorePath, []byte("fake-keystore-with-other-aliases"), 0644); err != nil {
t.Fatal(err)
}
mock := &mockExecutor{
responses: []mockResponse{
// keytool -exportkeystore: alias not present → non-zero exit
// with the well-known "Alias <server> does not exist" message.
{Output: "keytool error: java.lang.Exception: Alias <server> does not exist", Err: fmt.Errorf("exit 1")},
{Output: "", Err: nil}, // -delete (best-effort, alias may not exist)
{Output: "Import command completed", Err: nil}, // -importkeystore (deploy)
},
}
c := NewWithExecutor(&Config{
KeystorePath: keystorePath,
KeystorePassword: "changeit",
KeystoreType: "PKCS12",
Alias: "server",
}, testLogger(), mock)
result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err != nil {
t.Fatalf("deploy should succeed when alias not in pre-existing keystore: %v", err)
}
if !result.Success {
t.Error("expected success=true")
}
}