Files
certctl/internal/deploy/doc.go
T
shankar0123 f5c67a51b2 feat(deploy): atomic write + validate + rollback primitive shared across all target connectors
Phase 1 of the deploy-hardening I master bundle. Closes the load-bearing
prerequisite for the seven Bundle I items by extracting one canonical
atomic-deploy primitive at internal/deploy/ that all 13 target connectors
will consume in Phases 4-9.

The package ships:

- Plan + Apply API: write all File entries to sibling .certctl-tmp.<nanos>
  in the destination directory (same-filesystem guarantees os.Rename atomicity),
  call PreCommit (validate-with-the-target), atomic-rename all temps to final,
  call PostCommit (reload). On PostCommit failure, restore from pre-deploy
  backups + re-call PostCommit. If second PostCommit also fails, return
  ErrRollbackFailed (operator-actionable; documented loud).

- AtomicWriteFile lower-level entry for connectors that don't fit the Plan
  model (F5, K8s — they ship bytes through APIs, not local files).

- SHA-256 idempotency: every Apply short-circuits when all File destinations
  already match SHA-256 of new bytes. Defends against agent-restart retry
  storms hammering targets with no-op reloads.

- Ownership + mode preservation: existing nginx:nginx 0640 stays
  nginx:nginx 0640 across renewals. Per-target FileDefaults applies for
  first-deploy. Per-File explicit Mode/Owner/Group overrides win over both.
  Closes the silent-failure mode where os.WriteFile(path, bytes, 0600) at
  apache.go:119 (et al.) clobbered worker access.

- Backup retention janitor: pre-deploy backup at <path>.certctl-bak.<nanos>;
  default keeps last 3 (DefaultBackupRetention); BackupRetention=-1 disables
  backups (rollback impossible — documented foot-gun).

- File-level mutex via sync.Map: two concurrent Apply calls touching the
  same destination serialize. Per-target serialization (Phase 2) is finer-
  grained at the agent dispatch layer; this is the file-level guard.

- Sentinel errors for connector errors.Is checks:
  ErrPlanInvalid, ErrValidateFailed, ErrReloadFailed, ErrRollbackFailed.

Tests (37 named cases across deploy_test.go + coverage_test.go) pin every
load-bearing invariant the prompt's Phase 1 requires, plus error-leg
coverage uplifts:

- TestApply_HappyPath_PreCommitSucceeds_PostCommitSucceeds_FilesAtomic
- TestApply_PreCommitFails_NoFilesChanged (atomic-or-nothing on validate)
- TestApply_PostCommitFails_FilesRolledBack (rollback wire)
- TestApply_RollbackAlsoFails_ReturnsErrRollbackFailed (escalation path)
- TestApply_IdempotentSkip_SHA256Match (idempotency short-circuit)
- TestApply_PreservesExistingOwnerAndMode_WhenNotOverridden
- TestApply_RespectsOverrides_OwnerGroupMode
- TestApply_ConcurrentApplyToSameFile_Serializes (file-level lock)
- TestApply_BackupRetention_KeepsLastN (janitor pruning)
- TestApply_NoExistingFile_UsesDefaultsForOwnerGroupMode
- TestAtomicWriteFile_TempFileCleanedUpOnError
- TestAtomicWriteFile_RenameRaceWithReader_AtomicReadAlwaysSeesOldOrNew
  (POSIX-rename atomicity proof via concurrent reader)

Plus white-box tests for resolveOwnership, lookupUID/GID, and deeper error
legs in restoreFromBackups + applyOwnership + AtomicWriteFile.

Coverage 87.3% — practical ceiling without injecting a fault-aware FS
abstraction (Write/Sync/Close OS errors are unreachable from go test
without sudo'd disk-fill or a custom interface seam). Above the existing
service-layer 70% floor; Phases 4-9 will lift this further as they exercise
the package through real-connector use.

Race detector clean; gofmt + go vet + golangci-lint v2.11.4 all 0 issues.

The package is the load-bearing prerequisite for Phases 4-9. Phase 2 next:
per-target deploy mutex in cmd/agent/main.go.

Spec: cowork/deploy-hardening-i-prompt.md
Baseline + recon: cowork/deploy-hardening-i/baseline.md
2026-04-30 14:29:19 +00:00

70 lines
3.9 KiB
Go

// Package deploy provides the shared atomic-write + validate + rollback
// primitive consumed by every target connector under
// internal/connector/target/*.
//
// The deploy package closes the three procurement-checklist items where
// commercial competitors (Venafi, DigiCert Certificate Manager, Sectigo)
// historically beat certctl on a head-to-head deployment-grade
// comparison:
//
// 1. Atomic deploy with rollback — every file write is "all or nothing".
// A connector can never leave a target in a half-deployed state where
// the cert is updated but the chain isn't (or vice versa). Ships via
// Plan + Apply: temp-write all files together, run validate, atomic
// rename them all, run reload; on reload failure restore previous
// bytes + reload again.
// 2. Post-deploy TLS verification — the Apply caller wires its own
// PostCommit to do a TLS handshake against the target endpoint and
// compare the leaf-cert SHA-256 against what was just written. The
// deploy package surfaces the rollback wire when PostCommit fails;
// the connector decides what failure means.
// 3. (Vendor-specific deployment recipes — out of scope for the deploy
// package; covered in Bundle II.)
//
// Design tenets — all load-bearing for 13 connectors:
//
// - All-or-nothing across files. A Plan with N File entries either
// succeeds for all N or rolls back all N. No "two of three written"
// intermediate states are possible from a successful or failed Apply.
// - Cross-filesystem safety. Temp files always live in the same
// directory as the final destination, so os.Rename is guaranteed
// atomic on POSIX (a rename within the same filesystem). Writing
// temp files in /tmp would silently fall back to copy-and-rename
// across filesystems, breaking atomicity.
// - Idempotency. If every File's destination already has identical
// bytes (SHA-256 match), Apply returns SkippedAsIdempotent=true and
// calls neither PreCommit nor PostCommit. Defends against agent
// restart retry storms that would otherwise hammer the target with
// no-op reloads.
// - Ownership + mode preservation. The single most common
// silent-failure mode in cert deploys is the agent running as root
// calling os.WriteFile(path, bytes, 0600), which clobbers the
// existing nginx:nginx 0640 ownership and locks NGINX out of the
// key file. Apply preserves the existing destination's
// owner+group+mode unless the per-target config overrides; for new
// files it falls back to per-target-type defaults (e.g. nginx:nginx
// 0640).
// - Per-file serialization. The package keeps a sync.Map of file-level
// mutexes so two concurrent Apply calls touching the same path
// serialize. (Per-target serialization is Phase 2's job in the
// agent dispatch; this is a finer-grained file-level guard.)
// - Backup retention. Each successful write copies the previous bytes
// to <path>.certctl-bak.<unix-nanos>. A janitor prunes to the last
// N backups (default 3, configurable via Plan.BackupRetention or
// the CERTCTL_DEPLOY_BACKUP_RETENTION env var the agent passes in).
// Setting retention to 0 disables backups entirely — rollback
// becomes impossible; documented as a foot-gun.
//
// Origin: this package was created in the deploy-hardening I master
// bundle (Phase 1) as the load-bearing replacement for the duplicated
// os.WriteFile flows in 13 connectors. The Apply API mirrors the F5
// transaction model already at internal/connector/target/f5/f5.go:267
// — F5 was the only connector with rollback semantics before this
// bundle. Apply lifts that pattern up so every other connector gets
// the same atomicity bar without re-implementing it.
//
// Concurrency: every exported function is safe for concurrent callers.
// File-level serialization is automatic via the package-internal
// sync.Map of mutexes; callers do not need their own per-file lock.
package deploy