refactor(cmd/agent): split main.go into poll + deploy + discovery sibling files (Phase 9, 12 of N — LAST hotspot)

Phase 9 ARCH-M2 closure Sprint 12 — the LAST of the audit's named hotspot sub-splits. Splits cmd/agent/main.go (1489 LOC, the sixth-largest backend hotspot at audit time) via the Option B sibling-file pattern (mirrors the Sprint 8 cmd/server cut). Package stays `main`; every method is still defined on *Agent so each call site continues to resolve through Go's same-package method-set — no import-path or signature change. Audit prescription vs reality ============================= The audit's Tasks-Deferred row prescribed "main + poll + deploy + register sibling files." The actual cmd/agent/main.go has no `register` function — agent registration happens via the control-plane REST API (POST /api/v1/agents) before the agent process starts. The closest analogue in the agent binary is the filesystem-discovery scan (runDiscoveryScan + the parsePEMFile / parseDERFile / certToEntry / sha256Sum / certKeyInfo helpers), which is the agent's other "outbound report-to-server" surface alongside the inbound work-poll path. Sprint 12 substitutes `discovery` for `register` in the prescription and keeps the other three buckets as named: `main` (lifecycle + HTTP infrastructure + entrypoint), `poll` (work-poll + CSR-job execution), `deploy` (deployment-job execution + target connector factory). What moved ========== New `cmd/agent/poll.go` (279 LOC) — work-poll + CSR-job execution: - pollForWork: GET /api/v1/agents/{id}/work each tick; dispatches each returned JobItem to the right executor. - executeCSRJob: handles AwaitingCSR jobs by generating an ECDSA P-256 key locally, persisting it with 0600 permissions (key NEVER leaves the agent — CLAUDE.md "Agent-based key management"), creating + submitting the CSR. New `cmd/agent/deploy.go` (443 LOC) — deployment + target factory: - executeDeploymentJob: handles Pending deployment jobs by fetching the cert PEM, loading the locally-held private key (agent keygen mode), instantiating the appropriate target connector, calling DeployCertificate, and reporting status. - createTargetConnector: the 170-LOC switch over target_type that instantiates 14 different target connectors (apache / awsacm / azurekv / caddy / envoy / f5 / haproxy / iis / javakeystore / k8ssecret / nginx / postfix / ssh / traefik / wincertstore). Context is threaded through to SDK-driven connectors (AWSACM, AzureKeyVault) per the contextcheck linter fix in CI commit 502823d. - splitPEMChain + fetchCertificate (deploy-only helpers). New `cmd/agent/discovery.go` (275 LOC) — filesystem cert discovery: - runDiscoveryScan: walks each configured discovery directory, dispatches each candidate file to parsePEMFile / parseDERFile, batches the parsed entries, and POSTs them to /api/v1/agents/{id}/discoveries (the machine-to-machine surface that is intentionally NOT exposed via MCP). - parsePEMFile + parseDERFile + certToEntry + sha256Sum + certKeyInfo + the discoveredCertEntry struct that ties them together. What stays in main.go (644 LOC, down from 1489) ================================================ - Types: AgentConfig, Agent struct, ErrAgentRetired var, WorkResponse, JobItem. - Lifecycle: NewAgent constructor, Run, markRetired, sendHeartbeat, getOutboundIP, targetDeployMutex method. - Shared HTTP infrastructure: makeRequest (consumed by poll + deploy + discovery + lifecycle), reportJobStatus (consumed by poll + deploy). - Entrypoint: main(), getEnvDefault, getEnvBoolDefault, validateHTTPSScheme. Side-effect import cleanup ========================== 21 imports drop from cmd/agent/main.go as a clean side effect: Standard library (7): - crypto/ecdsa, crypto/elliptic (poll only) - crypto/rand (poll only) - crypto/rsa (discovery only) - crypto/sha256 (discovery only) - crypto/x509/pkix (poll only) - encoding/pem (poll + deploy + discovery) - path/filepath (poll + deploy + discovery) Target connectors (14): - internal/connector/target + apache + awsacm + azurekv + caddy + envoy + f5 + haproxy + iis + javakeystore + k8ssecret + nginx + postfix + ssh + traefik + wincertstore — all 14 were used ONLY by createTargetConnector and moved with the factory to deploy.go. The surviving main.go now imports 20 stdlib packages + zero internal packages — the leanest the agent binary's entrypoint has been since the agent first shipped target-connector orchestration. Per-import audit on every new sibling file is in the diff: - poll.go: context, crypto/ecdsa, crypto/elliptic, crypto/rand, crypto/x509, crypto/x509/pkix, encoding/json, encoding/pem, fmt, io, net/http, os, path/filepath, strings (no sync — the sync.Once / sync.Mutex / sync.Map usages all live in the surviving main.go's lifecycle code). - deploy.go: context, encoding/json, encoding/pem, fmt, io, net/http, os, path/filepath, strings + target + 14 connector packages. - discovery.go: context, crypto/ecdsa, crypto/rsa, crypto/sha256, crypto/x509, encoding/pem, fmt, io, net/http, os, path/filepath, strings, time. Net effect ========== main.go: 1489 → 644 LOC (-845 = -56.7%). Three new sibling files at 997 LOC total (845 moved + ~152 LOC of header + Phase 9 doc-comment overhead). Matches the Sprint 8 cmd/server pattern in shape (main + wire + migrations) and size reduction (-23.8% there vs -56.7% here — the agent had more concentrated single-purpose functions than the server's wiring-heavy main). Cumulative Phase 9 progress (all 6 named hotspots) ================================================== config.go 3403 → 1342 (-60.6%, Sprints 1-7) cmd/server/main.go 2966 → 2260 (-23.8%, Sprints 8 + 8b) service/acme.go 1965 → 1162 (-40.9%, Sprints 9 + 9b) mcp/tools.go 1867 → 109 (-94.2%, Sprint 10) auth_session_oidc 1577 → 452 (-71.3%, Sprint 11) cmd/agent/main.go 1489 → 644 (-56.7%, Sprint 12) TOTAL across 6 files: 13,267 → 5,969 LOC = -7,298 (-55.0%) All 6 named hotspots from the audit's top-6 list are now below 1,500 LOC. The largest remaining hotspot from the top-6 is cmd/server/main.go at 2,260 LOC (intentional — every backend service the server wires is one line in main(), so the size is roughly proportional to surface area, not concern-tangling). Behavior preservation contract ============================== 1. gofmt -l clean across all 4 affected files. 2. go vet ./cmd/agent/... — no findings. 3. staticcheck ./cmd/agent/... — no findings. 4. go test -short -count=1 ./cmd/agent/... — green (includes agent_test.go 1716-LOC suite that pins every moved function: pollForWork / executeCSRJob / executeDeploymentJob / createTargetConnector / runDiscoveryScan plus dispatch_test.go, deploy_mutex_test.go, keymem_test.go). 5. Broader-importer build green: go build ./... . Same-package resolution means every cross-file call (poll → makeRequest, deploy → makeRequest + reportJobStatus + verifyAnd- ReportDeployment in verify.go, discovery → makeRequest) resolves through Go's package-level method-set with zero compile-time cost + zero runtime overhead. The public surface of the cmd/agent binary is unchanged. What this commit closes ======================= Sprint 12 is the LAST of the audit's named top-6 hotspot sub-splits. The ARCH-M2 finding now reflects: - 6 of 6 named backend hotspots below 1,500 LOC. - 24 of 24 named sub-splits shipped across Sprints 1-12 (config family ×7 + cmd/server ×2 + service/acme ×2 + mcp/tools ×6 + auth_session_oidc ×4 + cmd/agent ×3). - 7,298 LOC of code-locality concentration removed across the top 6 files. Whether to flip ARCH-M2 from 🛠 Scaffolded to ✓ Shipped is now an operator-discretion call — every named target landed, but the finding's spirit ("split god-files by responsibility") is a continuous discipline rather than a binary done/not-done. Refs: ARCH-M2 (god-files), Phase 9 audit. Sprint 12 is the named- hotspot conclusion of Phase 9.
2026-07-27 22:38:55 +00:00 · 2026-05-14 10:36:08 +00:00
parent cd374b243e
commit 3094010880
4 changed files with 996 additions and 870 deletions
@@ -0,0 +1,443 @@
 // Copyright 2026 certctl LLC. All rights reserved.
 // SPDX-License-Identifier: BUSL-1.1
 package main
 import (
 	"context"
 	"encoding/json"
 	"encoding/pem"
 	"fmt"
 	"io"
 	"net/http"
 	"os"
 	"path/filepath"
 	"strings"
 	"github.com/certctl-io/certctl/internal/connector/target"
 	"github.com/certctl-io/certctl/internal/connector/target/apache"
 	"github.com/certctl-io/certctl/internal/connector/target/awsacm"
 	"github.com/certctl-io/certctl/internal/connector/target/azurekv"
 	"github.com/certctl-io/certctl/internal/connector/target/caddy"
 	"github.com/certctl-io/certctl/internal/connector/target/envoy"
 	"github.com/certctl-io/certctl/internal/connector/target/f5"
 	"github.com/certctl-io/certctl/internal/connector/target/haproxy"
 	"github.com/certctl-io/certctl/internal/connector/target/iis"
 	jks "github.com/certctl-io/certctl/internal/connector/target/javakeystore"
 	k8s "github.com/certctl-io/certctl/internal/connector/target/k8ssecret"
 	"github.com/certctl-io/certctl/internal/connector/target/nginx"
 	pf "github.com/certctl-io/certctl/internal/connector/target/postfix"
 	sshconn "github.com/certctl-io/certctl/internal/connector/target/ssh"
 	"github.com/certctl-io/certctl/internal/connector/target/traefik"
 	wcs "github.com/certctl-io/certctl/internal/connector/target/wincertstore"
 )
 // Phase 9 ARCH-M2 closure Sprint 12 (2026-05-14): extracted from
 // cmd/agent/main.go via the Option B sibling-file pattern.
 //
 // This file holds the DEPLOYMENT executor + the target connector
 // factory + the deploy-only helpers:
 //
 //   - executeDeploymentJob: handles Pending deployment jobs by
 //     fetching the cert PEM from the control plane, loading the
 //     locally-held private key (in agent keygen mode), instantiating
 //     the appropriate target connector via createTargetConnector,
 //     calling DeployCertificate on it, and reporting Completed or
 //     Failed back to the control plane.
 //   - createTargetConnector: the big switch over target_type that
 //     instantiates one of 14 target connectors (apache / awsacm /
 //     azurekv / caddy / envoy / f5 / haproxy / iis / javakeystore /
 //     k8ssecret / nginx / postfix / ssh / traefik / wincertstore).
 //     Context is threaded into SDK-driven connectors (AWSACM,
 //     AzureKeyVault) so credential resolution honors caller
 //     cancellation per the contextcheck linter — see CI commit
 //     502823d.
 //   - splitPEMChain: split a PEM chain into (first cert, rest).
 //   - fetchCertificate: pull the PEM chain from
 //     GET /api/v1/certificates/{certID}/version.
 //
 // All 14 target-connector imports were used ONLY by
 // createTargetConnector; moving the factory here also moved the
 // 14 connector imports out of main.go, leaving the surviving
 // cmd/agent/main.go with the minimal stdlib surface its lifecycle
 // + HTTP infrastructure needs.
 // executeDeploymentJob executes a deployment job by fetching the certificate and deploying it
 // to the target system using the appropriate connector (NGINX, F5 BIG-IP, or IIS).
 //
 // For agent keygen mode, the private key is read from the local key store (keyDir/certID.key)
 // rather than fetched from the server. The deployment includes the locally-held key.
 //
 // Flow:
 // 1. Report job as Running
 // 2. Fetch the certificate PEM from the control plane
 // 3. Load local private key if it exists (agent keygen mode)
 // 4. Instantiate the target connector based on target_type from the work response
 // 5. Call DeployCertificate on the connector
 // 6. Report job as Completed (or Failed)
 func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
 	a.logger.Info("executing deployment job",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID,
 		"target_type", job.TargetType)
 	// Report job as running
 	if err := a.reportJobStatus(ctx, job.ID, "Running", ""); err != nil {
 		a.logger.Error("failed to report job running", "error", err)
 	}
 	// Fetch the certificate from the control plane
 	certPEM, err := a.fetchCertificate(ctx, job.CertificateID)
 	if err != nil {
 		a.logger.Error("failed to fetch certificate",
 			"job_id", job.ID,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("cert fetch failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	a.logger.Info("certificate fetched for deployment",
 		"job_id", job.ID,
 		"cert_length", len(certPEM))
 	// Split PEM into cert and chain (separated by double newline between PEM blocks)
 	certOnly, chainPEM := splitPEMChain(certPEM)
 	// Check for locally-stored private key (agent keygen mode)
 	keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
 	var keyPEM string
 	keyData, err := os.ReadFile(keyPath)
 	if err != nil {
 		a.logger.Error("failed to read local private key for deployment",
 			"job_id", job.ID,
 			"key_path", keyPath,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key read failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "error", reportErr)
 		}
 		return
 	}
 	keyPEM = string(keyData)
 	a.logger.Info("loaded local private key for deployment",
 		"job_id", job.ID,
 		"key_path", keyPath)
 	// Deploy to the target using the appropriate connector
 	if job.TargetType != "" {
 		connector, err := a.createTargetConnector(ctx, job.TargetType, job.TargetConfig)
 		if err != nil {
 			a.logger.Error("failed to create target connector",
 				"job_id", job.ID,
 				"target_type", job.TargetType,
 				"error", err)
 			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("connector init failed: %v", err)); reportErr != nil {
 				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 			}
 			return
 		}
 		// Bundle 1 / RT-C1 closure (2026-05-12): defense in depth. The server
 		// runs internal/connector/target/configcheck.Validate on the way IN
 		// (Create/Update), and rejects shell metacharacters in command-bearing
 		// fields. Re-run the connector's full ValidateConfig here on the way
 		// OUT, before any DeployCertificate call. This catches (a) configs
 		// that pre-date the server-side guard, (b) corruption/tampering of
 		// the encrypted config blob, and (c) per-connector filesystem
 		// invariants (cert dir exists, paths writable) that the server can't
 		// check because the filesystem is on the agent host.
 		if err := connector.ValidateConfig(ctx, job.TargetConfig); err != nil {
 			a.logger.Error("connector config validation failed",
 				"job_id", job.ID,
 				"target_type", job.TargetType,
 				"error", err)
 			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("%s config validation failed: %v", job.TargetType, err)); reportErr != nil {
 				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 			}
 			return
 		}
 		deployReq := target.DeploymentRequest{
 			CertPEM:      certOnly,
 			KeyPEM:       keyPEM,
 			ChainPEM:     chainPEM,
 			TargetConfig: job.TargetConfig,
 			Metadata: map[string]string{
 				"certificate_id": job.CertificateID,
 				"job_id":         job.ID,
 			},
 		}
 		// Phase 2 of the deploy-hardening I master bundle:
 		// per-target deploy mutex. Acquire BEFORE
 		// DeployCertificate so two concurrent renewals against
 		// the same target ID serialize. The lock is held for the
 		// full Deploy duration including PreCommit (validate),
 		// PostCommit (reload), and post-deploy verify (Phases
 		// 4-9). Released on every return path via defer.
 		var targetID string
 		if job.TargetID != nil {
 			targetID = *job.TargetID
 		}
 		if mu := a.targetDeployMutex(targetID); mu != nil {
 			mu.Lock()
 			defer mu.Unlock()
 		}
 		result, err := connector.DeployCertificate(ctx, deployReq)
 		if err != nil {
 			a.logger.Error("deployment failed",
 				"job_id", job.ID,
 				"target_type", job.TargetType,
 				"error", err)
 			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("deployment failed: %v", err)); reportErr != nil {
 				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 			}
 			return
 		}
 		a.logger.Info("target connector deployment completed",
 			"job_id", job.ID,
 			"target_type", job.TargetType,
 			"success", result.Success,
 			"message", result.Message)
 		// If verification is enabled, verify the deployment by probing the live TLS endpoint
 		targetHost, targetPort, err := extractTargetHostAndPort(job.TargetConfig)
 		if err != nil {
 			a.logger.Warn("could not extract target host/port for verification",
 				"job_id", job.ID,
 				"error", err)
 		} else {
 			a.verifyAndReportDeployment(ctx, job, targetHost, targetPort, certOnly)
 		}
 	} else {
 		a.logger.Info("no target type specified, skipping connector invocation",
 			"job_id", job.ID)
 	}
 	// Report job as completed
 	if err := a.reportJobStatus(ctx, job.ID, "Completed", ""); err != nil {
 		a.logger.Error("failed to report job completed", "error", err)
 		return
 	}
 	a.logger.Info("deployment job completed", "job_id", job.ID)
 }
 // createTargetConnector instantiates the appropriate target connector based on type.
 // ctx is threaded into SDK-driven connectors (AWSACM, AzureKeyVault) so credential
 // resolution honors caller cancellation / deadlines instead of using a fresh
 // context.Background() (the contextcheck linter enforces this — the original Rank 5
 // implementation used Background() and tripped CI on commit 502823d).
 func (a *Agent) createTargetConnector(ctx context.Context, targetType string, configJSON json.RawMessage) (target.Connector, error) {
 	switch targetType {
 	case "NGINX":
 		var cfg nginx.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid NGINX config: %w", err)
 			}
 		}
 		return nginx.New(&cfg, a.logger), nil
 	case "Apache":
 		var cfg apache.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Apache config: %w", err)
 			}
 		}
 		return apache.New(&cfg, a.logger), nil
 	case "HAProxy":
 		var cfg haproxy.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid HAProxy config: %w", err)
 			}
 		}
 		return haproxy.New(&cfg, a.logger), nil
 	case "F5":
 		var cfg f5.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid F5 config: %w", err)
 			}
 		}
 		conn, err := f5.New(&cfg, a.logger)
 		if err != nil {
 			return nil, fmt.Errorf("failed to create F5 connector: %w", err)
 		}
 		return conn, nil
 	case "IIS":
 		var cfg iis.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid IIS config: %w", err)
 			}
 		}
 		return iis.New(&cfg, a.logger)
 	case "Traefik":
 		var cfg traefik.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Traefik config: %w", err)
 			}
 		}
 		return traefik.New(&cfg, a.logger), nil
 	case "Caddy":
 		var cfg caddy.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Caddy config: %w", err)
 			}
 		}
 		return caddy.New(&cfg, a.logger), nil
 	case "Envoy":
 		var cfg envoy.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Envoy config: %w", err)
 			}
 		}
 		return envoy.New(&cfg, a.logger), nil
 	case "Postfix":
 		var cfg pf.Config
 		cfg.Mode = "postfix"
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Postfix config: %w", err)
 			}
 		}
 		return pf.New(&cfg, a.logger), nil
 	case "Dovecot":
 		var cfg pf.Config
 		cfg.Mode = "dovecot"
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Dovecot config: %w", err)
 			}
 		}
 		return pf.New(&cfg, a.logger), nil
 	case "SSH":
 		var cfg sshconn.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid SSH config: %w", err)
 			}
 		}
 		return sshconn.New(&cfg, a.logger)
 	case "WinCertStore":
 		var cfg wcs.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid WinCertStore config: %w", err)
 			}
 		}
 		return wcs.New(&cfg, a.logger)
 	case "JavaKeystore":
 		var cfg jks.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid JavaKeystore config: %w", err)
 			}
 		}
 		return jks.New(&cfg, a.logger), nil
 	case "KubernetesSecrets":
 		var cfg k8s.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid KubernetesSecrets config: %w", err)
 			}
 		}
 		return k8s.New(&cfg, a.logger)
 	case "AWSACM":
 		// Rank 5 of the 2026-05-03 Infisical deep-research deliverable.
 		// AWS Certificate Manager target — SDK-driven (no file I/O).
 		// LoadDefaultConfig handles the standard AWS credential chain
 		// (IRSA / EC2 instance profile / SSO / env vars) without any
 		// long-lived creds in connector Config.
 		var cfg awsacm.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid AWSACM config: %w", err)
 			}
 		}
 		return awsacm.New(ctx, &cfg, a.logger)
 	case "AzureKeyVault":
 		// Rank 5 of the 2026-05-03 Infisical deep-research deliverable.
 		// Azure Key Vault target — SDK-driven (no file I/O).
 		// DefaultAzureCredential handles the standard Azure credential
 		// chain (managed identity / workload identity / env vars / az
 		// CLI fallback). Long-lived service-principal secrets are
 		// supported but discouraged via the credential_mode config.
 		var cfg azurekv.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid AzureKeyVault config: %w", err)
 			}
 		}
 		return azurekv.New(ctx, &cfg, a.logger)
 	default:
 		return nil, fmt.Errorf("unsupported target type: %s", targetType)
 	}
 }
 // splitPEMChain splits a PEM chain into the first certificate (cert) and the rest (chain).
 // The control plane returns the full chain as a single string with PEM blocks concatenated.
 func splitPEMChain(pemChain string) (string, string) {
 	data := []byte(pemChain)
 	block, rest := pem.Decode(data)
 	if block == nil {
 		return pemChain, ""
 	}
 	cert := string(pem.EncodeToMemory(block))
 	// Skip whitespace between cert and chain
 	chain := strings.TrimSpace(string(rest))
 	if chain == "" {
 		return cert, ""
 	}
 	return cert, chain
 }
 // fetchCertificate retrieves the certificate PEM chain from the control plane.
 // GET /api/v1/agents/{agentID}/certificates/{certID}
 func (a *Agent) fetchCertificate(ctx context.Context, certID string) (string, error) {
 	path := fmt.Sprintf("/api/v1/agents/%s/certificates/%s", a.config.AgentID, certID)
 	resp, err := a.makeRequest(ctx, http.MethodGet, path, nil)
 	if err != nil {
 		return "", fmt.Errorf("request failed: %w", err)
 	}
 	defer resp.Body.Close()
 	if resp.StatusCode != http.StatusOK {
 		body, _ := io.ReadAll(resp.Body)
 		return "", fmt.Errorf("server returned %d: %s", resp.StatusCode, string(body))
 	}
 	var certResp struct {
 		CertificatePEM string `json:"certificate_pem"`
 	}
 	if err := json.NewDecoder(resp.Body).Decode(&certResp); err != nil {
 		return "", fmt.Errorf("failed to decode response: %w", err)
 	}
 	return certResp.CertificatePEM, nil
 }
@@ -0,0 +1,275 @@
 // Copyright 2026 certctl LLC. All rights reserved.
 // SPDX-License-Identifier: BUSL-1.1
 package main
 import (
 	"context"
 	"crypto/ecdsa"
 	"crypto/rsa"
 	"crypto/sha256"
 	"crypto/x509"
 	"encoding/pem"
 	"fmt"
 	"io"
 	"net/http"
 	"os"
 	"path/filepath"
 	"strings"
 	"time"
 )
 // Phase 9 ARCH-M2 closure Sprint 12 (2026-05-14): extracted from
 // cmd/agent/main.go via the Option B sibling-file pattern.
 //
 // This file holds the filesystem DISCOVERY scan — the agent's
 // outbound surface for reporting pre-existing certificates it
 // finds on disk back to the control plane (POST /api/v1/agents/
 // {id}/discoveries, a machine-to-machine flow NOT exposed via the
 // MCP surface per the comment in
 // internal/mcp/tools.go::RegisterTools):
 //
 //   - runDiscoveryScan: walks each configured discovery directory,
 //     dispatches each candidate file to parsePEMFile or parseDERFile
 //     depending on extension, batches the parsed entries, and POSTs
 //     them in one report.
 //   - parsePEMFile / parseDERFile: extract every X.509 certificate
 //     from a candidate file in either encoding.
 //   - certToEntry: project a parsed *x509.Certificate into the
 //     discoveredCertEntry shape the control plane expects.
 //   - discoveredCertEntry struct + sha256Sum + certKeyInfo helpers
 //     consumed only by the discovery path; co-locating them keeps
 //     this file self-contained.
 // runDiscoveryScan walks configured directories, parses certificate files, and reports
 // discovered certificates to the control plane.
 // Supports PEM and DER encoded X.509 certificates.
 func (a *Agent) runDiscoveryScan(ctx context.Context) {
 	a.logger.Info("starting filesystem certificate discovery scan",
 		"directories", a.config.DiscoveryDirs)
 	startTime := time.Now()
 	var certs []discoveredCertEntry
 	var scanErrors []string
 	for _, dir := range a.config.DiscoveryDirs {
 		a.logger.Debug("scanning directory", "path", dir)
 		err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
 			if err != nil {
 				scanErrors = append(scanErrors, fmt.Sprintf("walk error at %s: %v", path, err))
 				return nil // continue walking
 			}
 			if info.IsDir() {
 				return nil
 			}
 			// Skip files larger than 1MB (unlikely to be a certificate)
 			if info.Size() > 1*1024*1024 {
 				return nil
 			}
 			// Check file extension
 			ext := strings.ToLower(filepath.Ext(path))
 			switch ext {
 			case ".pem", ".crt", ".cer", ".cert":
 				found := a.parsePEMFile(path)
 				certs = append(certs, found...)
 			case ".der":
 				if entry, err := a.parseDERFile(path); err == nil {
 					certs = append(certs, entry)
 				} else {
 					a.logger.Debug("skipping non-cert DER file", "path", path, "error", err)
 				}
 			default:
 				// Try PEM parsing for extensionless files or unknown extensions
 				if ext == "" || ext == ".key" {
 					return nil // skip key files and extensionless
 				}
 				found := a.parsePEMFile(path)
 				if len(found) > 0 {
 					certs = append(certs, found...)
 				}
 			}
 			return nil
 		})
 		if err != nil {
 			scanErrors = append(scanErrors, fmt.Sprintf("failed to walk %s: %v", dir, err))
 		}
 	}
 	scanDuration := time.Since(startTime)
 	a.logger.Info("discovery scan completed",
 		"certificates_found", len(certs),
 		"errors", len(scanErrors),
 		"duration_ms", scanDuration.Milliseconds())
 	if len(certs) == 0 && len(scanErrors) == 0 {
 		a.logger.Debug("no certificates found and no errors, skipping report")
 		return
 	}
 	// Build report payload
 	entries := make([]map[string]interface{}, len(certs))
 	for i, c := range certs {
 		entries[i] = map[string]interface{}{
 			"fingerprint_sha256": c.FingerprintSHA256,
 			"common_name":        c.CommonName,
 			"sans":               c.SANs,
 			"serial_number":      c.SerialNumber,
 			"issuer_dn":          c.IssuerDN,
 			"subject_dn":         c.SubjectDN,
 			"not_before":         c.NotBefore,
 			"not_after":          c.NotAfter,
 			"key_algorithm":      c.KeyAlgorithm,
 			"key_size":           c.KeySize,
 			"is_ca":              c.IsCA,
 			"pem_data":           c.PEMData,
 			"source_path":        c.SourcePath,
 			"source_format":      c.SourceFormat,
 		}
 	}
 	report := map[string]interface{}{
 		"agent_id":         a.config.AgentID,
 		"directories":      a.config.DiscoveryDirs,
 		"certificates":     entries,
 		"errors":           scanErrors,
 		"scan_duration_ms": int(scanDuration.Milliseconds()),
 	}
 	// Submit to control plane
 	path := fmt.Sprintf("/api/v1/agents/%s/discoveries", a.config.AgentID)
 	resp, err := a.makeRequest(ctx, http.MethodPost, path, report)
 	if err != nil {
 		a.logger.Error("failed to submit discovery report", "error", err)
 		return
 	}
 	defer resp.Body.Close()
 	if resp.StatusCode != http.StatusAccepted {
 		body, _ := io.ReadAll(resp.Body)
 		a.logger.Error("discovery report rejected",
 			"status", resp.StatusCode,
 			"body", string(body))
 		return
 	}
 	a.logger.Info("discovery report submitted successfully",
 		"certificates", len(certs),
 		"errors", len(scanErrors))
 }
 // discoveredCertEntry holds parsed certificate metadata for reporting.
 type discoveredCertEntry struct {
 	FingerprintSHA256 string   `json:"fingerprint_sha256"`
 	CommonName        string   `json:"common_name"`
 	SANs              []string `json:"sans"`
 	SerialNumber      string   `json:"serial_number"`
 	IssuerDN          string   `json:"issuer_dn"`
 	SubjectDN         string   `json:"subject_dn"`
 	NotBefore         string   `json:"not_before"`
 	NotAfter          string   `json:"not_after"`
 	KeyAlgorithm      string   `json:"key_algorithm"`
 	KeySize           int      `json:"key_size"`
 	IsCA              bool     `json:"is_ca"`
 	PEMData           string   `json:"pem_data"`
 	SourcePath        string   `json:"source_path"`
 	SourceFormat      string   `json:"source_format"`
 }
 // parsePEMFile reads a file and extracts all X.509 certificates from PEM blocks.
 func (a *Agent) parsePEMFile(path string) []discoveredCertEntry {
 	data, err := os.ReadFile(path)
 	if err != nil {
 		a.logger.Debug("failed to read file", "path", path, "error", err)
 		return nil
 	}
 	var entries []discoveredCertEntry
 	rest := data
 	for {
 		var block *pem.Block
 		block, rest = pem.Decode(rest)
 		if block == nil {
 			break
 		}
 		if block.Type != "CERTIFICATE" {
 			continue
 		}
 		cert, err := x509.ParseCertificate(block.Bytes)
 		if err != nil {
 			a.logger.Debug("failed to parse certificate in PEM", "path", path, "error", err)
 			continue
 		}
 		pemStr := string(pem.EncodeToMemory(block))
 		entries = append(entries, certToEntry(cert, path, "PEM", pemStr))
 	}
 	return entries
 }
 // parseDERFile reads a DER-encoded certificate file.
 func (a *Agent) parseDERFile(path string) (discoveredCertEntry, error) {
 	data, err := os.ReadFile(path)
 	if err != nil {
 		return discoveredCertEntry{}, fmt.Errorf("read failed: %w", err)
 	}
 	cert, err := x509.ParseCertificate(data)
 	if err != nil {
 		return discoveredCertEntry{}, fmt.Errorf("parse failed: %w", err)
 	}
 	// Convert to PEM for storage
 	pemStr := string(pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: data}))
 	return certToEntry(cert, path, "DER", pemStr), nil
 }
 // certToEntry converts a parsed x509.Certificate into a discoveredCertEntry.
 func certToEntry(cert *x509.Certificate, path, format, pemData string) discoveredCertEntry {
 	// Compute SHA-256 fingerprint
 	fingerprint := fmt.Sprintf("%x", sha256Sum(cert.Raw))
 	// Determine key algorithm and size
 	keyAlg, keySize := certKeyInfo(cert)
 	return discoveredCertEntry{
 		FingerprintSHA256: fingerprint,
 		CommonName:        cert.Subject.CommonName,
 		SANs:              cert.DNSNames,
 		SerialNumber:      cert.SerialNumber.Text(16),
 		IssuerDN:          cert.Issuer.String(),
 		SubjectDN:         cert.Subject.String(),
 		NotBefore:         cert.NotBefore.UTC().Format(time.RFC3339),
 		NotAfter:          cert.NotAfter.UTC().Format(time.RFC3339),
 		KeyAlgorithm:      keyAlg,
 		KeySize:           keySize,
 		IsCA:              cert.IsCA,
 		PEMData:           pemData,
 		SourcePath:        path,
 		SourceFormat:      format,
 	}
 }
 // sha256Sum returns the SHA-256 hash of data.
 func sha256Sum(data []byte) [32]byte {
 	return sha256.Sum256(data)
 }
 // certKeyInfo extracts key algorithm name and size from a certificate.
 func certKeyInfo(cert *x509.Certificate) (string, int) {
 	switch pub := cert.PublicKey.(type) {
 	case *ecdsa.PublicKey:
 		return "ECDSA", pub.Curve.Params().BitSize
 	case *rsa.PublicKey:
 		return "RSA", pub.N.BitLen()
 	default:
 		switch cert.PublicKeyAlgorithm {
 		case x509.Ed25519:
 			return "Ed25519", 256
 		default:
 			return cert.PublicKeyAlgorithm.String(), 0
 		}
 	}
 }
@@ -6,16 +6,9 @@ package main
 import (
 	"bytes"
 	"context"
 	"crypto/ecdsa"
 	"crypto/elliptic"
 	"crypto/rand"
 	"crypto/rsa"
 	"crypto/sha256"
 	"crypto/tls"
 	"crypto/x509"
 	"crypto/x509/pkix"
 	"encoding/json"
 	"encoding/pem"
 	"errors"
 	"flag"
 	"fmt"
@@ -26,29 +19,11 @@ import (
 	"net/url"
 	"os"
 	"os/signal"
 	"path/filepath"
 	"runtime"
 	"strings"
 	"sync"
 	"syscall"
 	"time"
 	"github.com/certctl-io/certctl/internal/connector/target"
 	"github.com/certctl-io/certctl/internal/connector/target/apache"
 	"github.com/certctl-io/certctl/internal/connector/target/awsacm"
 	"github.com/certctl-io/certctl/internal/connector/target/azurekv"
 	"github.com/certctl-io/certctl/internal/connector/target/caddy"
 	"github.com/certctl-io/certctl/internal/connector/target/envoy"
 	"github.com/certctl-io/certctl/internal/connector/target/f5"
 	"github.com/certctl-io/certctl/internal/connector/target/haproxy"
 	"github.com/certctl-io/certctl/internal/connector/target/iis"
 	jks "github.com/certctl-io/certctl/internal/connector/target/javakeystore"
 	k8s "github.com/certctl-io/certctl/internal/connector/target/k8ssecret"
 	"github.com/certctl-io/certctl/internal/connector/target/nginx"
 	pf "github.com/certctl-io/certctl/internal/connector/target/postfix"
 	sshconn "github.com/certctl-io/certctl/internal/connector/target/ssh"
 	"github.com/certctl-io/certctl/internal/connector/target/traefik"
 	wcs "github.com/certctl-io/certctl/internal/connector/target/wincertstore"
 )
 // AgentConfig represents the agent-side configuration.
@@ -394,618 +369,6 @@ func (a *Agent) sendHeartbeat(ctx context.Context) {
 	a.logger.Debug("heartbeat acknowledged")
 }
 // pollForWork queries the control plane for actionable jobs and processes them.
 // Jobs may be deployment jobs (Pending) or CSR jobs (AwaitingCSR).
 // GET /api/v1/agents/{agentID}/work
 func (a *Agent) pollForWork(ctx context.Context) {
 	a.logger.Debug("polling for work", "agent_id", a.config.AgentID)
 	path := fmt.Sprintf("/api/v1/agents/%s/work", a.config.AgentID)
 	resp, err := a.makeRequest(ctx, http.MethodGet, path, nil)
 	if err != nil {
 		a.logger.Error("work poll failed", "error", err)
 		a.consecutiveFailures++
 		return
 	}
 	defer resp.Body.Close()
 	// I-004: same terminal-retirement handling as sendHeartbeat. Work-poll is the
 	// other hot path that can observe an agent's soft-retirement; if the
 	// heartbeat tick happens to fire after a work-poll tick within the same
 	// retirement window, this branch catches it first. markRetired's sync.Once
 	// guards idempotency so racing both paths in the same tick only closes the
 	// signal channel once. No consecutiveFailures increment — retirement is
 	// not a transient failure.
 	if resp.StatusCode == http.StatusGone {
 		body, _ := io.ReadAll(resp.Body)
 		a.markRetired("work_poll", resp.StatusCode, string(body))
 		return
 	}
 	if resp.StatusCode != http.StatusOK {
 		body, _ := io.ReadAll(resp.Body)
 		a.logger.Error("work poll rejected",
 			"status", resp.StatusCode,
 			"body", string(body))
 		a.consecutiveFailures++
 		return
 	}
 	var workResp WorkResponse
 	if err := json.NewDecoder(resp.Body).Decode(&workResp); err != nil {
 		a.logger.Error("failed to decode work response", "error", err)
 		a.consecutiveFailures++
 		return
 	}
 	a.consecutiveFailures = 0
 	if workResp.Count == 0 {
 		a.logger.Debug("no pending work")
 		return
 	}
 	a.logger.Info("received work", "job_count", workResp.Count)
 	// Process each job based on type and status
 	for _, job := range workResp.Jobs {
 		switch {
 		case job.Status == "AwaitingCSR":
 			// Agent keygen mode: generate key locally, create CSR, submit to server
 			a.executeCSRJob(ctx, job)
 		case job.Type == "Deployment":
 			a.executeDeploymentJob(ctx, job)
 		}
 	}
 }
 // executeCSRJob handles an AwaitingCSR job: generates a private key locally, creates a CSR,
 // and submits it to the control plane for signing. The private key is stored on the local
 // filesystem with 0600 permissions and NEVER sent to the server.
 //
 // Flow:
 // 1. Generate ECDSA P-256 key pair
 // 2. Store private key to disk (keyDir/certID.key) with 0600 permissions
 // 3. Create CSR with common name and SANs from work response
 // 4. Submit CSR to control plane via POST /agents/{id}/csr
 // 5. Server signs the CSR and creates a cert version + deployment jobs
 func (a *Agent) executeCSRJob(ctx context.Context, job JobItem) {
 	a.logger.Info("executing CSR job (agent-side key generation)",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID,
 		"common_name", job.CommonName)
 	// Step 1: Generate ECDSA P-256 key pair
 	privKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
 	if err != nil {
 		a.logger.Error("failed to generate private key",
 			"job_id", job.ID,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key generation failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	a.logger.Info("generated ECDSA P-256 key pair locally",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID)
 	// Step 2: Store private key to disk with secure permissions.
 	//
 	// Bundle-9 / Audit L-002 + L-003: marshal+write through helpers that
 	// (a) zeroize the in-heap DER buffer immediately after the PEM block is
 	// constructed so the private scalar's exposure window is bounded by
 	// this function call, and (b) assert the key directory is mode 0700
 	// before any write touches disk. Also defer-clear the PEM buffer for
 	// the same reason — the encoded key isn't sensitive in transit (it's
 	// going to disk) but lingers on the heap if we don't.
 	keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
 	if err := ensureAgentKeyDirSecure(filepath.Dir(keyPath)); err != nil {
 		a.logger.Error("agent key dir hardening failed", "job_id", job.ID, "error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key dir hardening failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	var privKeyPEM []byte
 	if marshalErr := marshalAgentKeyAndZeroize(privKey, func(der []byte) error {
 		privKeyPEM = pem.EncodeToMemory(&pem.Block{
 			Type:  "EC PRIVATE KEY",
 			Bytes: der,
 		})
 		return nil
 	}); marshalErr != nil {
 		a.logger.Error("failed to marshal private key",
 			"job_id", job.ID,
 			"error", marshalErr)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key marshal failed: %v", marshalErr)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	defer clear(privKeyPEM)
 	if err := os.WriteFile(keyPath, privKeyPEM, 0600); err != nil {
 		a.logger.Error("failed to write private key to disk",
 			"job_id", job.ID,
 			"key_path", keyPath,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key storage failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	a.logger.Info("private key stored securely",
 		"job_id", job.ID,
 		"key_path", keyPath,
 		"permissions", "0600")
 	// Validate common name is present
 	if job.CommonName == "" {
 		a.logger.Error("empty common name in CSR job", "job_id", job.ID)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", "empty common name"); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "error", reportErr)
 		}
 		return
 	}
 	// Step 3: Create CSR with common name and SANs
 	// Split SANs into DNS names and email addresses for proper CSR encoding
 	var dnsNames []string
 	var emailAddresses []string
 	for _, san := range job.SANs {
 		if strings.Contains(san, "@") {
 			emailAddresses = append(emailAddresses, san)
 		} else {
 			dnsNames = append(dnsNames, san)
 		}
 	}
 	csrTemplate := &x509.CertificateRequest{
 		Subject: pkix.Name{
 			CommonName: job.CommonName,
 		},
 		DNSNames:       dnsNames,
 		EmailAddresses: emailAddresses,
 	}
 	csrDER, err := x509.CreateCertificateRequest(rand.Reader, csrTemplate, privKey)
 	if err != nil {
 		a.logger.Error("failed to create CSR",
 			"job_id", job.ID,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR creation failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	csrPEM := string(pem.EncodeToMemory(&pem.Block{
 		Type:  "CERTIFICATE REQUEST",
 		Bytes: csrDER,
 	}))
 	// Step 4: Submit CSR to the control plane (only the public key leaves the agent)
 	a.logger.Info("submitting CSR to control plane",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID)
 	submitPath := fmt.Sprintf("/api/v1/agents/%s/csr", a.config.AgentID)
 	resp, err := a.makeRequest(ctx, http.MethodPost, submitPath, map[string]string{
 		"csr_pem":        csrPEM,
 		"certificate_id": job.CertificateID,
 	})
 	if err != nil {
 		a.logger.Error("failed to submit CSR",
 			"job_id", job.ID,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR submission failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	defer resp.Body.Close()
 	if resp.StatusCode != http.StatusAccepted {
 		body, _ := io.ReadAll(resp.Body)
 		a.logger.Error("CSR submission rejected",
 			"job_id", job.ID,
 			"status", resp.StatusCode,
 			"body", string(body))
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR rejected: %s", string(body))); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	a.logger.Info("CSR submitted and signed successfully",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID,
 		"key_path", keyPath)
 }
 // executeDeploymentJob executes a deployment job by fetching the certificate and deploying it
 // to the target system using the appropriate connector (NGINX, F5 BIG-IP, or IIS).
 //
 // For agent keygen mode, the private key is read from the local key store (keyDir/certID.key)
 // rather than fetched from the server. The deployment includes the locally-held key.
 //
 // Flow:
 // 1. Report job as Running
 // 2. Fetch the certificate PEM from the control plane
 // 3. Load local private key if it exists (agent keygen mode)
 // 4. Instantiate the target connector based on target_type from the work response
 // 5. Call DeployCertificate on the connector
 // 6. Report job as Completed (or Failed)
 func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
 	a.logger.Info("executing deployment job",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID,
 		"target_type", job.TargetType)
 	// Report job as running
 	if err := a.reportJobStatus(ctx, job.ID, "Running", ""); err != nil {
 		a.logger.Error("failed to report job running", "error", err)
 	}
 	// Fetch the certificate from the control plane
 	certPEM, err := a.fetchCertificate(ctx, job.CertificateID)
 	if err != nil {
 		a.logger.Error("failed to fetch certificate",
 			"job_id", job.ID,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("cert fetch failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	a.logger.Info("certificate fetched for deployment",
 		"job_id", job.ID,
 		"cert_length", len(certPEM))
 	// Split PEM into cert and chain (separated by double newline between PEM blocks)
 	certOnly, chainPEM := splitPEMChain(certPEM)
 	// Check for locally-stored private key (agent keygen mode)
 	keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
 	var keyPEM string
 	keyData, err := os.ReadFile(keyPath)
 	if err != nil {
 		a.logger.Error("failed to read local private key for deployment",
 			"job_id", job.ID,
 			"key_path", keyPath,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key read failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "error", reportErr)
 		}
 		return
 	}
 	keyPEM = string(keyData)
 	a.logger.Info("loaded local private key for deployment",
 		"job_id", job.ID,
 		"key_path", keyPath)
 	// Deploy to the target using the appropriate connector
 	if job.TargetType != "" {
 		connector, err := a.createTargetConnector(ctx, job.TargetType, job.TargetConfig)
 		if err != nil {
 			a.logger.Error("failed to create target connector",
 				"job_id", job.ID,
 				"target_type", job.TargetType,
 				"error", err)
 			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("connector init failed: %v", err)); reportErr != nil {
 				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 			}
 			return
 		}
 		// Bundle 1 / RT-C1 closure (2026-05-12): defense in depth. The server
 		// runs internal/connector/target/configcheck.Validate on the way IN
 		// (Create/Update), and rejects shell metacharacters in command-bearing
 		// fields. Re-run the connector's full ValidateConfig here on the way
 		// OUT, before any DeployCertificate call. This catches (a) configs
 		// that pre-date the server-side guard, (b) corruption/tampering of
 		// the encrypted config blob, and (c) per-connector filesystem
 		// invariants (cert dir exists, paths writable) that the server can't
 		// check because the filesystem is on the agent host.
 		if err := connector.ValidateConfig(ctx, job.TargetConfig); err != nil {
 			a.logger.Error("connector config validation failed",
 				"job_id", job.ID,
 				"target_type", job.TargetType,
 				"error", err)
 			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("%s config validation failed: %v", job.TargetType, err)); reportErr != nil {
 				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 			}
 			return
 		}
 		deployReq := target.DeploymentRequest{
 			CertPEM:      certOnly,
 			KeyPEM:       keyPEM,
 			ChainPEM:     chainPEM,
 			TargetConfig: job.TargetConfig,
 			Metadata: map[string]string{
 				"certificate_id": job.CertificateID,
 				"job_id":         job.ID,
 			},
 		}
 		// Phase 2 of the deploy-hardening I master bundle:
 		// per-target deploy mutex. Acquire BEFORE
 		// DeployCertificate so two concurrent renewals against
 		// the same target ID serialize. The lock is held for the
 		// full Deploy duration including PreCommit (validate),
 		// PostCommit (reload), and post-deploy verify (Phases
 		// 4-9). Released on every return path via defer.
 		var targetID string
 		if job.TargetID != nil {
 			targetID = *job.TargetID
 		}
 		if mu := a.targetDeployMutex(targetID); mu != nil {
 			mu.Lock()
 			defer mu.Unlock()
 		}
 		result, err := connector.DeployCertificate(ctx, deployReq)
 		if err != nil {
 			a.logger.Error("deployment failed",
 				"job_id", job.ID,
 				"target_type", job.TargetType,
 				"error", err)
 			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("deployment failed: %v", err)); reportErr != nil {
 				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 			}
 			return
 		}
 		a.logger.Info("target connector deployment completed",
 			"job_id", job.ID,
 			"target_type", job.TargetType,
 			"success", result.Success,
 			"message", result.Message)
 		// If verification is enabled, verify the deployment by probing the live TLS endpoint
 		targetHost, targetPort, err := extractTargetHostAndPort(job.TargetConfig)
 		if err != nil {
 			a.logger.Warn("could not extract target host/port for verification",
 				"job_id", job.ID,
 				"error", err)
 		} else {
 			a.verifyAndReportDeployment(ctx, job, targetHost, targetPort, certOnly)
 		}
 	} else {
 		a.logger.Info("no target type specified, skipping connector invocation",
 			"job_id", job.ID)
 	}
 	// Report job as completed
 	if err := a.reportJobStatus(ctx, job.ID, "Completed", ""); err != nil {
 		a.logger.Error("failed to report job completed", "error", err)
 		return
 	}
 	a.logger.Info("deployment job completed", "job_id", job.ID)
 }
 // createTargetConnector instantiates the appropriate target connector based on type.
 // ctx is threaded into SDK-driven connectors (AWSACM, AzureKeyVault) so credential
 // resolution honors caller cancellation / deadlines instead of using a fresh
 // context.Background() (the contextcheck linter enforces this — the original Rank 5
 // implementation used Background() and tripped CI on commit 502823d).
 func (a *Agent) createTargetConnector(ctx context.Context, targetType string, configJSON json.RawMessage) (target.Connector, error) {
 	switch targetType {
 	case "NGINX":
 		var cfg nginx.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid NGINX config: %w", err)
 			}
 		}
 		return nginx.New(&cfg, a.logger), nil
 	case "Apache":
 		var cfg apache.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Apache config: %w", err)
 			}
 		}
 		return apache.New(&cfg, a.logger), nil
 	case "HAProxy":
 		var cfg haproxy.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid HAProxy config: %w", err)
 			}
 		}
 		return haproxy.New(&cfg, a.logger), nil
 	case "F5":
 		var cfg f5.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid F5 config: %w", err)
 			}
 		}
 		conn, err := f5.New(&cfg, a.logger)
 		if err != nil {
 			return nil, fmt.Errorf("failed to create F5 connector: %w", err)
 		}
 		return conn, nil
 	case "IIS":
 		var cfg iis.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid IIS config: %w", err)
 			}
 		}
 		return iis.New(&cfg, a.logger)
 	case "Traefik":
 		var cfg traefik.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Traefik config: %w", err)
 			}
 		}
 		return traefik.New(&cfg, a.logger), nil
 	case "Caddy":
 		var cfg caddy.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Caddy config: %w", err)
 			}
 		}
 		return caddy.New(&cfg, a.logger), nil
 	case "Envoy":
 		var cfg envoy.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Envoy config: %w", err)
 			}
 		}
 		return envoy.New(&cfg, a.logger), nil
 	case "Postfix":
 		var cfg pf.Config
 		cfg.Mode = "postfix"
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Postfix config: %w", err)
 			}
 		}
 		return pf.New(&cfg, a.logger), nil
 	case "Dovecot":
 		var cfg pf.Config
 		cfg.Mode = "dovecot"
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid Dovecot config: %w", err)
 			}
 		}
 		return pf.New(&cfg, a.logger), nil
 	case "SSH":
 		var cfg sshconn.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid SSH config: %w", err)
 			}
 		}
 		return sshconn.New(&cfg, a.logger)
 	case "WinCertStore":
 		var cfg wcs.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid WinCertStore config: %w", err)
 			}
 		}
 		return wcs.New(&cfg, a.logger)
 	case "JavaKeystore":
 		var cfg jks.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid JavaKeystore config: %w", err)
 			}
 		}
 		return jks.New(&cfg, a.logger), nil
 	case "KubernetesSecrets":
 		var cfg k8s.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid KubernetesSecrets config: %w", err)
 			}
 		}
 		return k8s.New(&cfg, a.logger)
 	case "AWSACM":
 		// Rank 5 of the 2026-05-03 Infisical deep-research deliverable.
 		// AWS Certificate Manager target — SDK-driven (no file I/O).
 		// LoadDefaultConfig handles the standard AWS credential chain
 		// (IRSA / EC2 instance profile / SSO / env vars) without any
 		// long-lived creds in connector Config.
 		var cfg awsacm.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid AWSACM config: %w", err)
 			}
 		}
 		return awsacm.New(ctx, &cfg, a.logger)
 	case "AzureKeyVault":
 		// Rank 5 of the 2026-05-03 Infisical deep-research deliverable.
 		// Azure Key Vault target — SDK-driven (no file I/O).
 		// DefaultAzureCredential handles the standard Azure credential
 		// chain (managed identity / workload identity / env vars / az
 		// CLI fallback). Long-lived service-principal secrets are
 		// supported but discouraged via the credential_mode config.
 		var cfg azurekv.Config
 		if len(configJSON) > 0 {
 			if err := json.Unmarshal(configJSON, &cfg); err != nil {
 				return nil, fmt.Errorf("invalid AzureKeyVault config: %w", err)
 			}
 		}
 		return azurekv.New(ctx, &cfg, a.logger)
 	default:
 		return nil, fmt.Errorf("unsupported target type: %s", targetType)
 	}
 }
 // splitPEMChain splits a PEM chain into the first certificate (cert) and the rest (chain).
 // The control plane returns the full chain as a single string with PEM blocks concatenated.
 func splitPEMChain(pemChain string) (string, string) {
 	data := []byte(pemChain)
 	block, rest := pem.Decode(data)
 	if block == nil {
 		return pemChain, ""
 	}
 	cert := string(pem.EncodeToMemory(block))
 	// Skip whitespace between cert and chain
 	chain := strings.TrimSpace(string(rest))
 	if chain == "" {
 		return cert, ""
 	}
 	return cert, chain
 }
 // fetchCertificate retrieves the certificate PEM chain from the control plane.
 // GET /api/v1/agents/{agentID}/certificates/{certID}
 func (a *Agent) fetchCertificate(ctx context.Context, certID string) (string, error) {
 	path := fmt.Sprintf("/api/v1/agents/%s/certificates/%s", a.config.AgentID, certID)
 	resp, err := a.makeRequest(ctx, http.MethodGet, path, nil)
 	if err != nil {
 		return "", fmt.Errorf("request failed: %w", err)
 	}
 	defer resp.Body.Close()
 	if resp.StatusCode != http.StatusOK {
 		body, _ := io.ReadAll(resp.Body)
 		return "", fmt.Errorf("server returned %d: %s", resp.StatusCode, string(body))
 	}
 	var certResp struct {
 		CertificatePEM string `json:"certificate_pem"`
 	}
 	if err := json.NewDecoder(resp.Body).Decode(&certResp); err != nil {
 		return "", fmt.Errorf("failed to decode response: %w", err)
 	}
 	return certResp.CertificatePEM, nil
 }
 // reportJobStatus reports the result of a job back to the control plane.
 // POST /api/v1/agents/{agentID}/jobs/{jobID}/status
 func (a *Agent) reportJobStatus(ctx context.Context, jobID string, status string, errorMsg string) error {
@@ -1067,239 +430,6 @@ func (a *Agent) makeRequest(ctx context.Context, method, path string, body inter
 	return resp, nil
 }
 // runDiscoveryScan walks configured directories, parses certificate files, and reports
 // discovered certificates to the control plane.
 // Supports PEM and DER encoded X.509 certificates.
 func (a *Agent) runDiscoveryScan(ctx context.Context) {
 	a.logger.Info("starting filesystem certificate discovery scan",
 		"directories", a.config.DiscoveryDirs)
 	startTime := time.Now()
 	var certs []discoveredCertEntry
 	var scanErrors []string
 	for _, dir := range a.config.DiscoveryDirs {
 		a.logger.Debug("scanning directory", "path", dir)
 		err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
 			if err != nil {
 				scanErrors = append(scanErrors, fmt.Sprintf("walk error at %s: %v", path, err))
 				return nil // continue walking
 			}
 			if info.IsDir() {
 				return nil
 			}
 			// Skip files larger than 1MB (unlikely to be a certificate)
 			if info.Size() > 1*1024*1024 {
 				return nil
 			}
 			// Check file extension
 			ext := strings.ToLower(filepath.Ext(path))
 			switch ext {
 			case ".pem", ".crt", ".cer", ".cert":
 				found := a.parsePEMFile(path)
 				certs = append(certs, found...)
 			case ".der":
 				if entry, err := a.parseDERFile(path); err == nil {
 					certs = append(certs, entry)
 				} else {
 					a.logger.Debug("skipping non-cert DER file", "path", path, "error", err)
 				}
 			default:
 				// Try PEM parsing for extensionless files or unknown extensions
 				if ext == "" || ext == ".key" {
 					return nil // skip key files and extensionless
 				}
 				found := a.parsePEMFile(path)
 				if len(found) > 0 {
 					certs = append(certs, found...)
 				}
 			}
 			return nil
 		})
 		if err != nil {
 			scanErrors = append(scanErrors, fmt.Sprintf("failed to walk %s: %v", dir, err))
 		}
 	}
 	scanDuration := time.Since(startTime)
 	a.logger.Info("discovery scan completed",
 		"certificates_found", len(certs),
 		"errors", len(scanErrors),
 		"duration_ms", scanDuration.Milliseconds())
 	if len(certs) == 0 && len(scanErrors) == 0 {
 		a.logger.Debug("no certificates found and no errors, skipping report")
 		return
 	}
 	// Build report payload
 	entries := make([]map[string]interface{}, len(certs))
 	for i, c := range certs {
 		entries[i] = map[string]interface{}{
 			"fingerprint_sha256": c.FingerprintSHA256,
 			"common_name":        c.CommonName,
 			"sans":               c.SANs,
 			"serial_number":      c.SerialNumber,
 			"issuer_dn":          c.IssuerDN,
 			"subject_dn":         c.SubjectDN,
 			"not_before":         c.NotBefore,
 			"not_after":          c.NotAfter,
 			"key_algorithm":      c.KeyAlgorithm,
 			"key_size":           c.KeySize,
 			"is_ca":              c.IsCA,
 			"pem_data":           c.PEMData,
 			"source_path":        c.SourcePath,
 			"source_format":      c.SourceFormat,
 		}
 	}
 	report := map[string]interface{}{
 		"agent_id":         a.config.AgentID,
 		"directories":      a.config.DiscoveryDirs,
 		"certificates":     entries,
 		"errors":           scanErrors,
 		"scan_duration_ms": int(scanDuration.Milliseconds()),
 	}
 	// Submit to control plane
 	path := fmt.Sprintf("/api/v1/agents/%s/discoveries", a.config.AgentID)
 	resp, err := a.makeRequest(ctx, http.MethodPost, path, report)
 	if err != nil {
 		a.logger.Error("failed to submit discovery report", "error", err)
 		return
 	}
 	defer resp.Body.Close()
 	if resp.StatusCode != http.StatusAccepted {
 		body, _ := io.ReadAll(resp.Body)
 		a.logger.Error("discovery report rejected",
 			"status", resp.StatusCode,
 			"body", string(body))
 		return
 	}
 	a.logger.Info("discovery report submitted successfully",
 		"certificates", len(certs),
 		"errors", len(scanErrors))
 }
 // discoveredCertEntry holds parsed certificate metadata for reporting.
 type discoveredCertEntry struct {
 	FingerprintSHA256 string   `json:"fingerprint_sha256"`
 	CommonName        string   `json:"common_name"`
 	SANs              []string `json:"sans"`
 	SerialNumber      string   `json:"serial_number"`
 	IssuerDN          string   `json:"issuer_dn"`
 	SubjectDN         string   `json:"subject_dn"`
 	NotBefore         string   `json:"not_before"`
 	NotAfter          string   `json:"not_after"`
 	KeyAlgorithm      string   `json:"key_algorithm"`
 	KeySize           int      `json:"key_size"`
 	IsCA              bool     `json:"is_ca"`
 	PEMData           string   `json:"pem_data"`
 	SourcePath        string   `json:"source_path"`
 	SourceFormat      string   `json:"source_format"`
 }
 // parsePEMFile reads a file and extracts all X.509 certificates from PEM blocks.
 func (a *Agent) parsePEMFile(path string) []discoveredCertEntry {
 	data, err := os.ReadFile(path)
 	if err != nil {
 		a.logger.Debug("failed to read file", "path", path, "error", err)
 		return nil
 	}
 	var entries []discoveredCertEntry
 	rest := data
 	for {
 		var block *pem.Block
 		block, rest = pem.Decode(rest)
 		if block == nil {
 			break
 		}
 		if block.Type != "CERTIFICATE" {
 			continue
 		}
 		cert, err := x509.ParseCertificate(block.Bytes)
 		if err != nil {
 			a.logger.Debug("failed to parse certificate in PEM", "path", path, "error", err)
 			continue
 		}
 		pemStr := string(pem.EncodeToMemory(block))
 		entries = append(entries, certToEntry(cert, path, "PEM", pemStr))
 	}
 	return entries
 }
 // parseDERFile reads a DER-encoded certificate file.
 func (a *Agent) parseDERFile(path string) (discoveredCertEntry, error) {
 	data, err := os.ReadFile(path)
 	if err != nil {
 		return discoveredCertEntry{}, fmt.Errorf("read failed: %w", err)
 	}
 	cert, err := x509.ParseCertificate(data)
 	if err != nil {
 		return discoveredCertEntry{}, fmt.Errorf("parse failed: %w", err)
 	}
 	// Convert to PEM for storage
 	pemStr := string(pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: data}))
 	return certToEntry(cert, path, "DER", pemStr), nil
 }
 // certToEntry converts a parsed x509.Certificate into a discoveredCertEntry.
 func certToEntry(cert *x509.Certificate, path, format, pemData string) discoveredCertEntry {
 	// Compute SHA-256 fingerprint
 	fingerprint := fmt.Sprintf("%x", sha256Sum(cert.Raw))
 	// Determine key algorithm and size
 	keyAlg, keySize := certKeyInfo(cert)
 	return discoveredCertEntry{
 		FingerprintSHA256: fingerprint,
 		CommonName:        cert.Subject.CommonName,
 		SANs:              cert.DNSNames,
 		SerialNumber:      cert.SerialNumber.Text(16),
 		IssuerDN:          cert.Issuer.String(),
 		SubjectDN:         cert.Subject.String(),
 		NotBefore:         cert.NotBefore.UTC().Format(time.RFC3339),
 		NotAfter:          cert.NotAfter.UTC().Format(time.RFC3339),
 		KeyAlgorithm:      keyAlg,
 		KeySize:           keySize,
 		IsCA:              cert.IsCA,
 		PEMData:           pemData,
 		SourcePath:        path,
 		SourceFormat:      format,
 	}
 }
 // sha256Sum returns the SHA-256 hash of data.
 func sha256Sum(data []byte) [32]byte {
 	return sha256.Sum256(data)
 }
 // certKeyInfo extracts key algorithm name and size from a certificate.
 func certKeyInfo(cert *x509.Certificate) (string, int) {
 	switch pub := cert.PublicKey.(type) {
 	case *ecdsa.PublicKey:
 		return "ECDSA", pub.Curve.Params().BitSize
 	case *rsa.PublicKey:
 		return "RSA", pub.N.BitLen()
 	default:
 		switch cert.PublicKeyAlgorithm {
 		case x509.Ed25519:
 			return "Ed25519", 256
 		default:
 			return cert.PublicKeyAlgorithm.String(), 0
 		}
 	}
 }
 func main() {
 	// Parse command-line flags (with env var fallbacks for Docker deployment)
 	serverURL := flag.String("server", getEnvDefault("CERTCTL_SERVER_URL", "https://localhost:8443"), "Control plane server URL (must be https://)")
@@ -0,0 +1,278 @@
 // Copyright 2026 certctl LLC. All rights reserved.
 // SPDX-License-Identifier: BUSL-1.1
 package main
 import (
 	"context"
 	"crypto/ecdsa"
 	"crypto/elliptic"
 	"crypto/rand"
 	"crypto/x509"
 	"crypto/x509/pkix"
 	"encoding/json"
 	"encoding/pem"
 	"fmt"
 	"io"
 	"net/http"
 	"os"
 	"path/filepath"
 	"strings"
 )
 // Phase 9 ARCH-M2 closure Sprint 12 (2026-05-14): extracted from
 // cmd/agent/main.go via the Option B sibling-file pattern (mirrors
 // the Sprint 8 cmd/server cut). Package stays `main`; all methods
 // are still defined on *Agent so every call site continues to
 // resolve through Go's same-package method-set without any
 // import-path change.
 //
 // This file holds the WORK-POLLING entry point + CSR-job execution
 // — the inbound side of the agent's pull-only deployment model
 // (per CLAUDE.md "Pull-only deployment model" architecture
 // decision):
 //
 //   - pollForWork: queries GET /api/v1/agents/{id}/work each tick;
 //     dispatches each returned JobItem to the appropriate
 //     executor (CSR vs deployment).
 //   - executeCSRJob: handles AwaitingCSR jobs by generating an
 //     ECDSA P-256 key locally, persisting it to keyDir/<certID>.key
 //     with 0600 permissions (key NEVER leaves the agent — see
 //     CLAUDE.md "Agent-based key management"), creating the CSR,
 //     and POSTing it to the control plane for signing.
 //
 // The deployment-job executor lives in deploy.go alongside the
 // target connector factory + deploy-only helpers (splitPEMChain,
 // fetchCertificate). The discovery scan lives in discovery.go.
 // pollForWork queries the control plane for actionable jobs and processes them.
 // Jobs may be deployment jobs (Pending) or CSR jobs (AwaitingCSR).
 // GET /api/v1/agents/{agentID}/work
 func (a *Agent) pollForWork(ctx context.Context) {
 	a.logger.Debug("polling for work", "agent_id", a.config.AgentID)
 	path := fmt.Sprintf("/api/v1/agents/%s/work", a.config.AgentID)
 	resp, err := a.makeRequest(ctx, http.MethodGet, path, nil)
 	if err != nil {
 		a.logger.Error("work poll failed", "error", err)
 		a.consecutiveFailures++
 		return
 	}
 	defer resp.Body.Close()
 	// I-004: same terminal-retirement handling as sendHeartbeat. Work-poll is the
 	// other hot path that can observe an agent's soft-retirement; if the
 	// heartbeat tick happens to fire after a work-poll tick within the same
 	// retirement window, this branch catches it first. markRetired's sync.Once
 	// guards idempotency so racing both paths in the same tick only closes the
 	// signal channel once. No consecutiveFailures increment — retirement is
 	// not a transient failure.
 	if resp.StatusCode == http.StatusGone {
 		body, _ := io.ReadAll(resp.Body)
 		a.markRetired("work_poll", resp.StatusCode, string(body))
 		return
 	}
 	if resp.StatusCode != http.StatusOK {
 		body, _ := io.ReadAll(resp.Body)
 		a.logger.Error("work poll rejected",
 			"status", resp.StatusCode,
 			"body", string(body))
 		a.consecutiveFailures++
 		return
 	}
 	var workResp WorkResponse
 	if err := json.NewDecoder(resp.Body).Decode(&workResp); err != nil {
 		a.logger.Error("failed to decode work response", "error", err)
 		a.consecutiveFailures++
 		return
 	}
 	a.consecutiveFailures = 0
 	if workResp.Count == 0 {
 		a.logger.Debug("no pending work")
 		return
 	}
 	a.logger.Info("received work", "job_count", workResp.Count)
 	// Process each job based on type and status
 	for _, job := range workResp.Jobs {
 		switch {
 		case job.Status == "AwaitingCSR":
 			// Agent keygen mode: generate key locally, create CSR, submit to server
 			a.executeCSRJob(ctx, job)
 		case job.Type == "Deployment":
 			a.executeDeploymentJob(ctx, job)
 		}
 	}
 }
 // executeCSRJob handles an AwaitingCSR job: generates a private key locally, creates a CSR,
 // and submits it to the control plane for signing. The private key is stored on the local
 // filesystem with 0600 permissions and NEVER sent to the server.
 //
 // Flow:
 // 1. Generate ECDSA P-256 key pair
 // 2. Store private key to disk (keyDir/certID.key) with 0600 permissions
 // 3. Create CSR with common name and SANs from work response
 // 4. Submit CSR to control plane via POST /agents/{id}/csr
 // 5. Server signs the CSR and creates a cert version + deployment jobs
 func (a *Agent) executeCSRJob(ctx context.Context, job JobItem) {
 	a.logger.Info("executing CSR job (agent-side key generation)",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID,
 		"common_name", job.CommonName)
 	// Step 1: Generate ECDSA P-256 key pair
 	privKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
 	if err != nil {
 		a.logger.Error("failed to generate private key",
 			"job_id", job.ID,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key generation failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	a.logger.Info("generated ECDSA P-256 key pair locally",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID)
 	// Step 2: Store private key to disk with secure permissions.
 	//
 	// Bundle-9 / Audit L-002 + L-003: marshal+write through helpers that
 	// (a) zeroize the in-heap DER buffer immediately after the PEM block is
 	// constructed so the private scalar's exposure window is bounded by
 	// this function call, and (b) assert the key directory is mode 0700
 	// before any write touches disk. Also defer-clear the PEM buffer for
 	// the same reason — the encoded key isn't sensitive in transit (it's
 	// going to disk) but lingers on the heap if we don't.
 	keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
 	if err := ensureAgentKeyDirSecure(filepath.Dir(keyPath)); err != nil {
 		a.logger.Error("agent key dir hardening failed", "job_id", job.ID, "error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key dir hardening failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	var privKeyPEM []byte
 	if marshalErr := marshalAgentKeyAndZeroize(privKey, func(der []byte) error {
 		privKeyPEM = pem.EncodeToMemory(&pem.Block{
 			Type:  "EC PRIVATE KEY",
 			Bytes: der,
 		})
 		return nil
 	}); marshalErr != nil {
 		a.logger.Error("failed to marshal private key",
 			"job_id", job.ID,
 			"error", marshalErr)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key marshal failed: %v", marshalErr)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	defer clear(privKeyPEM)
 	if err := os.WriteFile(keyPath, privKeyPEM, 0600); err != nil {
 		a.logger.Error("failed to write private key to disk",
 			"job_id", job.ID,
 			"key_path", keyPath,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key storage failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	a.logger.Info("private key stored securely",
 		"job_id", job.ID,
 		"key_path", keyPath,
 		"permissions", "0600")
 	// Validate common name is present
 	if job.CommonName == "" {
 		a.logger.Error("empty common name in CSR job", "job_id", job.ID)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", "empty common name"); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "error", reportErr)
 		}
 		return
 	}
 	// Step 3: Create CSR with common name and SANs
 	// Split SANs into DNS names and email addresses for proper CSR encoding
 	var dnsNames []string
 	var emailAddresses []string
 	for _, san := range job.SANs {
 		if strings.Contains(san, "@") {
 			emailAddresses = append(emailAddresses, san)
 		} else {
 			dnsNames = append(dnsNames, san)
 		}
 	}
 	csrTemplate := &x509.CertificateRequest{
 		Subject: pkix.Name{
 			CommonName: job.CommonName,
 		},
 		DNSNames:       dnsNames,
 		EmailAddresses: emailAddresses,
 	}
 	csrDER, err := x509.CreateCertificateRequest(rand.Reader, csrTemplate, privKey)
 	if err != nil {
 		a.logger.Error("failed to create CSR",
 			"job_id", job.ID,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR creation failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	csrPEM := string(pem.EncodeToMemory(&pem.Block{
 		Type:  "CERTIFICATE REQUEST",
 		Bytes: csrDER,
 	}))
 	// Step 4: Submit CSR to the control plane (only the public key leaves the agent)
 	a.logger.Info("submitting CSR to control plane",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID)
 	submitPath := fmt.Sprintf("/api/v1/agents/%s/csr", a.config.AgentID)
 	resp, err := a.makeRequest(ctx, http.MethodPost, submitPath, map[string]string{
 		"csr_pem":        csrPEM,
 		"certificate_id": job.CertificateID,
 	})
 	if err != nil {
 		a.logger.Error("failed to submit CSR",
 			"job_id", job.ID,
 			"error", err)
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR submission failed: %v", err)); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	defer resp.Body.Close()
 	if resp.StatusCode != http.StatusAccepted {
 		body, _ := io.ReadAll(resp.Body)
 		a.logger.Error("CSR submission rejected",
 			"job_id", job.ID,
 			"status", resp.StatusCode,
 			"body", string(body))
 		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR rejected: %s", string(body))); reportErr != nil {
 			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
 		}
 		return
 	}
 	a.logger.Info("CSR submitted and signed successfully",
 		"job_id", job.ID,
 		"certificate_id", job.CertificateID,
 		"key_path", keyPath)
 }