mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-11 21:48:54 +00:00
feat(agent): per-target deploy mutex serializes concurrent deploys to the same target
Phase 2 of the deploy-hardening I master bundle. Closes the agent-side race window where two concurrent renewals against the same target ID (typical: two SAN entries renewing in the same window) would otherwise collide on the connector's temp-file path or run the reload command against itself. The Agent struct grows a sync.Map of *sync.Mutex keyed on target ID; targetDeployMutex(targetID) lazy-init's one on first acquisition. executeDeploymentJob acquires the mutex before connector.DeployCertificate and releases via defer at function exit — the lock spans the full Deploy duration including PreCommit (validate), atomic-rename, PostCommit (reload), and post-deploy verify (Phases 4-9). Granularity per frozen decision 0.5: one mutex per target ID, NOT per (target, cert) pair. Cert deploy throughput is operator-grade tens-per-minute; coarse serialization simplifies reasoning about reload-side race windows. Mutexes live for the agent's lifetime — target IDs are bounded so no janitor needed (~16 bytes per entry). Empty TargetID (defensive — should never happen for deploy jobs) bypasses the lock to avoid a singleton serialization point pulling all targetless work onto a shared mutex. Tests (5 named cases in cmd/agent/deploy_mutex_test.go): - TestAgent_ConcurrentDeploysToSameTarget_Serialize — race-detector smoke; 10 goroutines acquire same target's mutex; max-in-flight asserts == 1 - TestAgent_DifferentTargetIDs_ParallelizeIndependently — per-target granularity proof - TestAgent_EmptyTargetID_ReturnsNilMutex — defensive contract - TestAgent_TargetMutex_IsStable — sync.Map LoadOrStore returns same pointer across calls - TestAgent_TargetMutex_RaceLookup — race-free under N=50 concurrent lookups for same key go test -race -count=1 green; gofmt + go vet + golangci-lint v2.11.4 all 0 issues against my new code (pre-existing import-grouping drift in agent_test.go / main.go / verify*.go is unrelated to this change and not caught by `go fmt ./...` which CI uses). Phase 3 next: ValidateOnly method on target.Connector interface; default impl returns ErrValidateOnlyNotSupported across all 13 connectors.
This commit is contained in:
@@ -95,6 +95,47 @@ type Agent struct {
|
||||
// race with ctx.Done() and other cases.
|
||||
retiredOnce sync.Once
|
||||
retiredSignal chan struct{}
|
||||
|
||||
// Deploy-hardening I Phase 2: per-target deploy mutex.
|
||||
// Two cert renewals against the same target ID (e.g., two SAN
|
||||
// entries renewing in the same window, or a fast-cycling
|
||||
// renewal-then-test workflow) MUST serialize at the agent
|
||||
// dispatch site. Without this lock, the underlying connector's
|
||||
// temp-file path could collide and the reload command would
|
||||
// race against itself.
|
||||
//
|
||||
// Granularity is one mutex per target ID, NOT per (target, cert)
|
||||
// pair — frozen decision 0.5. Cert deploy throughput is
|
||||
// operator-grade tens-per-minute; coarse serialization is fine
|
||||
// and simplifies reasoning about reload-side race windows.
|
||||
//
|
||||
// sync.Map is sized for thousands of unique target IDs without
|
||||
// rehash thrash; LoadOrStore is atomic + lock-free on the
|
||||
// hot path. Mutexes live for the agent's lifetime — no janitor
|
||||
// because target IDs are bounded and the per-target memory
|
||||
// (~16 bytes per entry) is negligible vs. typical agent heap.
|
||||
//
|
||||
// Job items without a TargetID (e.g., agent-managed cert + no
|
||||
// connector dispatch — should never happen for deploy jobs but
|
||||
// defended anyway) bypass the lock to avoid a singleton
|
||||
// serialization point.
|
||||
deployMutexes sync.Map // map[string]*sync.Mutex, keyed on JobItem.TargetID
|
||||
}
|
||||
|
||||
// targetDeployMutex returns the per-target-ID *sync.Mutex,
|
||||
// lazy-initialising one on first acquisition. Returns nil when
|
||||
// targetID is empty (caller should skip the lock entirely).
|
||||
//
|
||||
// Phase 2 of the deploy-hardening I master bundle: the load-bearing
|
||||
// serialization point that defends against concurrent deploys to the
|
||||
// same target stomping each other's temp-file paths or reload
|
||||
// commands.
|
||||
func (a *Agent) targetDeployMutex(targetID string) *sync.Mutex {
|
||||
if targetID == "" {
|
||||
return nil
|
||||
}
|
||||
v, _ := a.deployMutexes.LoadOrStore(targetID, &sync.Mutex{})
|
||||
return v.(*sync.Mutex)
|
||||
}
|
||||
|
||||
// WorkResponse represents the response from the work polling endpoint.
|
||||
@@ -667,6 +708,22 @@ func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
|
||||
},
|
||||
}
|
||||
|
||||
// Phase 2 of the deploy-hardening I master bundle:
|
||||
// per-target deploy mutex. Acquire BEFORE
|
||||
// DeployCertificate so two concurrent renewals against
|
||||
// the same target ID serialize. The lock is held for the
|
||||
// full Deploy duration including PreCommit (validate),
|
||||
// PostCommit (reload), and post-deploy verify (Phases
|
||||
// 4-9). Released on every return path via defer.
|
||||
var targetID string
|
||||
if job.TargetID != nil {
|
||||
targetID = *job.TargetID
|
||||
}
|
||||
if mu := a.targetDeployMutex(targetID); mu != nil {
|
||||
mu.Lock()
|
||||
defer mu.Unlock()
|
||||
}
|
||||
|
||||
result, err := connector.DeployCertificate(ctx, deployReq)
|
||||
if err != nil {
|
||||
a.logger.Error("deployment failed",
|
||||
|
||||
Reference in New Issue
Block a user