Implement M8: agent-side key generation with ECDSA P-256

Private keys never leave agent infrastructure. Agents generate ECDSA P-256
key pairs locally, store them with 0600 permissions, and submit only the CSR
(public key) to the control plane. New AwaitingCSR job state pauses
renewal/issuance jobs until the agent submits its CSR. Server-side keygen
retained behind CERTCTL_KEYGEN_MODE=server for demo/development.

Key changes:
- Dual keygen mode via CERTCTL_KEYGEN_MODE (agent default, server for demo)
- AwaitingCSR job state with CommonName/SANs in work response
- Agent ECDSA P-256 keygen, local key storage, CSR-only submission
- CompleteAgentCSRRenewal server-side flow for agent-submitted CSRs
- DeploymentRequest.KeyPEM for agent-provided keys during deployment
- Dockerfile.agent creates /var/lib/certctl/keys with correct ownership

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Shankar
2026-03-15 13:51:41 -04:00
parent 4b2e694abe
commit 3ef6316457
15 changed files with 576 additions and 184 deletions
+40 -24
View File
@@ -7,13 +7,13 @@ You are my long-term copilot for building certctl — a self-hosted certificate
- [x] PostgreSQL 16 schema (14 tables, TEXT primary keys, idempotent migrations)
- [x] REST API — 41 endpoints under /api/v1/ with pagination, filtering, async actions
- [x] Web dashboard — Vite + React 18 + TypeScript + TanStack Query, 11 views wired to real API, dark theme
- [x] Agent binary — heartbeat, work polling, cert fetch, job status reporting (real HTTP calls)
- [x] Agent binary — heartbeat, work polling, cert fetch, CSR generation, job status reporting (real HTTP calls)
- [x] Local CA issuer connector — crypto/x509, in-memory CA, self-signed certs
- [x] Issuer connector wired end-to-end — Local CA registered in server, adapter bridging connector<->service layers
- [x] Renewal job processor — generates RSA key + CSR, calls issuer, stores cert version, creates deployment jobs
- [x] Renewal job processor — dual-mode keygen: agent mode (AwaitingCSR) or server mode (server-side RSA key + CSR)
- [x] Issuance job processor — reuses renewal flow (same mechanics for Local CA)
- [x] Agent CSR signing — SubmitCSR forwards to issuer connector, stores signed cert version
- [x] Agent work API — GET /agents/{id}/work returns pending deployment jobs
- [x] Agent work API — GET /agents/{id}/work returns pending deployment + AwaitingCSR jobs with cert details
- [x] Agent job status API — POST /agents/{id}/jobs/{job_id}/status for agent feedback
- [x] NGINX target connector — file write, config validation, reload
- [x] F5 BIG-IP target connector — REST API integration
@@ -22,7 +22,7 @@ You are my long-term copilot for building certctl — a self-hosted certificate
- [x] Email + Webhook notifier interfaces
- [x] Policy engine — 4 rule types, violation tracking, severity levels
- [x] Immutable audit trail — append-only, no update/delete
- [x] Job system — 4 types (Issuance, Renewal, Deployment, Validation), state machine
- [x] Job system — 4 types (Issuance, Renewal, Deployment, Validation), 6 states (Pending, AwaitingCSR, Running, Completed, Failed, Cancelled)
- [x] Background scheduler — 4 loops (renewal 1h, jobs 30s, health 2m, notifications 1m)
- [x] Docker Compose deployment — server + postgres + agent, health checks, seed data
- [x] Demo mode — 14 certs, 5 agents, 5 targets, policies, audit events, notifications
@@ -35,9 +35,12 @@ You are my long-term copilot for building certctl — a self-hosted certificate
- [x] Token bucket rate limiting — configurable RPS/burst, 429 responses with Retry-After header
- [x] Configurable CORS — per-origin allowlist or wildcard, preflight caching
- [x] GUI auth flow — login screen, auth context, 401 auto-redirect, logout button
- [x] Agent-side key generation — ECDSA P-256 keygen on agent, CSR-only submission, private keys never leave agent
- [x] Dual keygen mode — `CERTCTL_KEYGEN_MODE=agent` (default, production) or `server` (demo only with log warning)
- [x] AwaitingCSR job state — renewal/issuance jobs pause for agent CSR submission in agent keygen mode
- [x] Agent local key storage — keys written to `CERTCTL_KEY_DIR` (default /var/lib/certctl/keys) with 0600 permissions
### What's NOT Wired Up Yet (Pre-v1.0 Gaps)
- [ ] **Agent-side key generation**: V1 uses server-side key generation for Local CA (pragmatic for dev/demo). Must move to agents before v1.0.
- [ ] **End-to-end test hardening**: Handler tests only cover 2 of 7 files. No negative-path integration tests (issuer down, malformed certs, DB failures). No scheduler or connector tests. No frontend tests.
---
@@ -74,25 +77,37 @@ API key auth middleware with SHA-256 hashing and constant-time comparison. `CERT
The principle: **every backend feature ships with its corresponding GUI surface.** The GUI is where ops teams spend 80% of their time — it must be an operational tool, not a demo viewer.
### M8: Agent-Side Key Generation
**Goal**: Private keys never leave agent infrastructure. This is the crypto architecture gate for v1.0.
### M8: Agent-Side Key Generation ✅ COMPLETE
**Goal**: Private keys never leave agent infrastructure. Crypto architecture gate for v1.0.
**Agent key generation:**
- Agent generates RSA-2048 or ECDSA P-256 key pair locally
- Agent creates CSR (public key only) and submits via `POST /agents/{id}/csr`
- Control plane signs CSR via issuer connector, returns cert + chain (no private key)
- Agent stores key locally with file permissions 0600
**Implemented:**
- `CERTCTL_KEYGEN_MODE` config: `agent` (default) or `server` (demo only)
- `AwaitingCSR` job state: renewal/issuance jobs pause for agent to generate key + submit CSR
- Agent generates ECDSA P-256 key pairs locally (crypto/ecdsa + crypto/elliptic)
- Agent stores private keys to disk (`CERTCTL_KEY_DIR`, default `/var/lib/certctl/keys`) with 0600 permissions
- Agent creates CSR with common name + SANs from work response, submits via `POST /agents/{id}/csr`
- Server signs agent-submitted CSR via `CompleteAgentCSRRenewal`, stores cert version with CSR (not private key)
- Work endpoint enriched: AwaitingCSR jobs include `common_name` and `sans` so agent knows what CSR to generate
- Deployment jobs read local private key from key store for target connector deployment
- `DeploymentRequest` struct extended with `KeyPEM` field for agent-provided keys
- Server-side keygen retained for `CERTCTL_KEYGEN_MODE=server` with explicit log warning
- Docker Compose demo uses `CERTCTL_KEYGEN_MODE=server` for backward compatibility
**Server-side keygen flagging:**
- Server-side keygen retained only for Local CA with explicit `--server-side-keygen` flag
- Default behavior: reject issuance requests without agent-submitted CSR
- Clear log warnings when server-side keygen is active
**Files created:**
(none — all changes to existing files)
**ACME integration:**
- Agent handles ACME HTTP-01 challenge locally (challenge server on agent)
- Or: agent submits CSR, server handles ACME flow, returns signed cert
**Deliverables**: Private keys isolated from control plane for all production issuers. Server-side keygen flagged as demo-only.
**Files modified:**
- `internal/domain/job.go` — Added `JobStatusAwaitingCSR`, `CommonName`/`SANs` fields to `WorkItem`
- `internal/config/config.go` — Added `KeygenConfig` struct and `CERTCTL_KEYGEN_MODE` env var
- `internal/service/renewal.go` — Added `keygenMode` field, split `ProcessRenewalJob` into `processRenewalAgentKeygen` and `processRenewalServerKeygen`, added `CompleteAgentCSRRenewal`, `GetAwaitingCSRJobs`, `createDeploymentJobs`
- `internal/service/agent.go` — Added `renewalService` dependency, updated `SubmitCSR` to handle AwaitingCSR flow, updated `GetPendingWork` to return AwaitingCSR jobs, updated `GetWorkWithTargets` to enrich with cert details
- `internal/connector/target/interface.go` — Added `KeyPEM` field to `DeploymentRequest`
- `cmd/server/main.go` — Passes `keygenMode` to `NewRenewalService`, passes `renewalService` to `NewAgentService`, added keygen mode log line
- `cmd/agent/main.go` — Added crypto imports, `KeyDir` config, `executeCSRJob` method (ECDSA P-256 keygen + CSR creation + submission), deployment reads local key, added `--key-dir` flag / `CERTCTL_KEY_DIR` env var
- `deploy/docker-compose.yml` — Added `CERTCTL_KEYGEN_MODE=server` for demo
- `internal/service/renewal_test.go` — Updated all `NewRenewalService` calls with `keygenMode` param
- `internal/service/job_test.go` — Updated `NewRenewalService` call with `keygenMode` param
- `internal/integration/lifecycle_test.go` — Updated `NewRenewalService` and `NewAgentService` calls
### M9: End-to-End Test Hardening
**Goal**: Comprehensive test coverage across all layers as the final quality gate before v1.0.
@@ -130,11 +145,12 @@ The principle: **every backend feature ships with its corresponding GUI surface.
### v1.0.0 Release
**Gate criteria** — all must be true:
- [ ] All M5M9 deliverables complete
- [x] All M5M8 deliverables complete
- [ ] M9 deliverables complete (test hardening)
- [ ] CI green with coverage gates passing (service 70%+, handler 60%+)
- [ ] GUI functional against real API (no demo mode fallback needed)
- [ ] Agent-side keygen working for ACME issuer
- [ ] API auth enforced by default
- [x] Agent-side keygen working (ECDSA P-256, AwaitingCSR flow)
- [x] API auth enforced by default
- [ ] Negative-path integration tests passing
- [ ] README screenshots of actual dashboard
- [ ] Tagged Docker images published
+3 -1
View File
@@ -29,7 +29,9 @@ WORKDIR /app
COPY --from=builder /app/bin/agent .
RUN chown -R certctl:certctl /app
# Create key storage directory for agent-side keygen
RUN mkdir -p /var/lib/certctl/keys && \
chown -R certctl:certctl /app /var/lib/certctl
USER certctl
+7 -7
View File
@@ -8,7 +8,7 @@ A self-hosted certificate lifecycle platform. Track, renew, and deploy TLS certi
## What It Does
certctl gives you a single pane of glass for every TLS certificate in your organization. The **web dashboard** shows your full certificate inventory — what's healthy, what's expiring, what's already expired, and who owns each one. The **REST API** (40+ endpoints) lets you automate everything. **Agents** deployed on your infrastructure handle certificate deployment, and key generation moves to agents in M8 so private keys never leave your servers.
certctl gives you a single pane of glass for every TLS certificate in your organization. The **web dashboard** shows your full certificate inventory — what's healthy, what's expiring, what's already expired, and who owns each one. The **REST API** (40+ endpoints) lets you automate everything. **Agents** deployed on your infrastructure generate private keys locally and submit CSRs — private keys never leave your servers.
```mermaid
flowchart LR
@@ -115,7 +115,7 @@ flowchart TB
### Key Design Decisions
- **Private keys isolated from the control plane (M8 goal).** Currently, the Local CA issuer generates server-side keys for simplicity. M8 moves key generation to agents — agents generate keys locally and submit CSRs (public key only). The architecture is designed for this separation; server-side keygen will be flagged as demo-only.
- **Private keys isolated from the control plane.** Agents generate ECDSA P-256 keys locally and submit CSRs (public key only). The server signs the CSR and returns the certificate — private keys never touch the control plane. Server-side keygen is available via `CERTCTL_KEYGEN_MODE=server` for demo/development only.
- **TEXT primary keys, not UUIDs.** IDs are human-readable prefixed strings (`mc-api-prod`, `t-platform`, `o-alice`) so you can identify resource types at a glance in logs and queries.
- **Handler → Service → Repository layering.** Handlers define their own service interfaces for clean dependency inversion. No global service singletons.
- **Idempotent migrations.** All schema uses `IF NOT EXISTS` and seed data uses `ON CONFLICT (id) DO NOTHING`, safe for repeated execution.
@@ -153,6 +153,7 @@ All server environment variables use the `CERTCTL_` prefix:
| `CERTCTL_LOG_FORMAT` | `json` | Log format: `json` or `text` |
| `CERTCTL_AUTH_TYPE` | `api-key` | Auth mode: `api-key`, `jwt`, or `none` |
| `CERTCTL_AUTH_SECRET` | — | Required for `api-key` and `jwt` auth types |
| `CERTCTL_KEYGEN_MODE` | `agent` | Key generation mode: `agent` (production) or `server` (demo only) |
| `CERTCTL_ACME_DIRECTORY_URL` | — | ACME directory URL (e.g., Let's Encrypt staging) |
| `CERTCTL_ACME_EMAIL` | — | Contact email for ACME account registration |
@@ -164,6 +165,7 @@ Agent environment variables:
| `CERTCTL_API_KEY` | — | Agent API key |
| `CERTCTL_AGENT_NAME` | `certctl-agent` | Agent display name |
| `CERTCTL_AGENT_ID` | — | Registered agent ID (required) |
| `CERTCTL_KEY_DIR` | `/var/lib/certctl/keys` | Directory for storing private keys (agent keygen mode) |
Docker Compose overrides these for the demo stack (see `deploy/docker-compose.yml`): port `8443`, auth type `none`, database pointing to the postgres container.
@@ -292,8 +294,8 @@ make docker-clean # Stop + remove volumes
## Security
### Private Key Management
- **V1 (Local CA)**: The control plane generates ephemeral RSA-2048 keys server-side for certificate issuance. This simplifies the initial implementation but means private keys exist on the control plane temporarily. Keys are stored in certificate version records.
- **M8+**: Private keys will be generated exclusively on agents, never sent to the control plane. Keys stored with file permissions 0600 and rotated after successful renewal.
- **Agent keygen mode (default)**: Agents generate ECDSA P-256 keys locally and store them with 0600 permissions in `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`). Only the CSR (public key) is sent to the control plane. Private keys never leave agent infrastructure.
- **Server keygen mode (demo only)**: Set `CERTCTL_KEYGEN_MODE=server` for development/demo with Local CA. The control plane generates RSA-2048 keys server-side. A log warning is emitted at startup.
### Authentication
- Agent-to-server: API key (registered at agent creation)
@@ -308,9 +310,7 @@ make docker-clean # Stop + remove volumes
## Roadmap
### V1 (in progress → v1.0.0)
Backend complete: end-to-end lifecycle, Local CA + ACME v2 issuers, NGINX/F5/IIS targets, threshold alerting. GUI fully wired to real API with 11 views. CI pipeline running. Remaining milestones before v1.0 tag:
- **M7: Auth + Rate Limiting** — API key auth enforced by default, token bucket rate limiting, CORS configuration, GUI login flow
- **M8: Agent-Side Key Generation** — agents generate keys locally, submit CSR only, private keys never leave infrastructure, server-side keygen flagged as demo-only
Backend complete: end-to-end lifecycle, Local CA + ACME v2 issuers, NGINX/F5/IIS targets, threshold alerting, agent-side keygen, auth + rate limiting. GUI fully wired to real API with 11 views. CI pipeline running. Remaining milestone before v1.0 tag:
- **M9: End-to-End Test Hardening** — handler tests for all 7 files, negative-path integration tests (issuer down, malformed CSR, DB failure), scheduler and connector tests, CI coverage gates (service 70%+, handler 60%+)
### V2: Operational Maturity
+166 -9
View File
@@ -3,7 +3,13 @@ package main
import (
"bytes"
"context"
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/x509"
"crypto/x509/pkix"
"encoding/json"
"encoding/pem"
"flag"
"fmt"
"io"
@@ -11,6 +17,7 @@ import (
"net/http"
"os"
"os/signal"
"path/filepath"
"syscall"
"time"
@@ -27,10 +34,13 @@ type AgentConfig struct {
AgentName string // Agent name for identification
AgentID string // Agent ID for API calls (set after registration or from env)
Hostname string // Server hostname
KeyDir string // Directory for storing private keys (default: /var/lib/certctl/keys)
}
// Agent represents the local agent that runs on target servers.
// It periodically sends heartbeats, polls for work, and executes deployment jobs.
// It periodically sends heartbeats, polls for work, and executes deployment and CSR jobs.
// In agent keygen mode, private keys are generated and stored locally — they never leave
// this process or filesystem.
type Agent struct {
config *AgentConfig
logger *slog.Logger
@@ -47,11 +57,13 @@ type WorkResponse struct {
Count int `json:"count"`
}
// JobItem represents a job returned from the control plane, enriched with target details.
// JobItem represents a job returned from the control plane, enriched with target/cert details.
type JobItem struct {
ID string `json:"id"`
Type string `json:"type"`
CertificateID string `json:"certificate_id"`
CommonName string `json:"common_name,omitempty"`
SANs []string `json:"sans,omitempty"`
TargetID *string `json:"target_id,omitempty"`
TargetType string `json:"target_type,omitempty"`
TargetConfig json.RawMessage `json:"target_config,omitempty"`
@@ -75,7 +87,13 @@ func (a *Agent) Run(ctx context.Context) error {
a.logger.Info("agent starting",
"server_url", a.config.ServerURL,
"agent_name", a.config.AgentName,
"agent_id", a.config.AgentID)
"agent_id", a.config.AgentID,
"key_dir", a.config.KeyDir)
// Ensure key directory exists with secure permissions
if err := os.MkdirAll(a.config.KeyDir, 0700); err != nil {
return fmt.Errorf("failed to create key directory %s: %w", a.config.KeyDir, err)
}
// Create ticker channels for heartbeat and polling
heartbeatTicker := time.NewTicker(a.heartbeatInterval)
@@ -131,7 +149,8 @@ func (a *Agent) sendHeartbeat(ctx context.Context) {
a.logger.Debug("heartbeat acknowledged")
}
// pollForWork queries the control plane for pending deployment jobs and processes them.
// pollForWork queries the control plane for actionable jobs and processes them.
// Jobs may be deployment jobs (Pending) or CSR jobs (AwaitingCSR).
// GET /api/v1/agents/{agentID}/work
func (a *Agent) pollForWork(ctx context.Context) {
a.logger.Debug("polling for work", "agent_id", a.config.AgentID)
@@ -165,23 +184,148 @@ func (a *Agent) pollForWork(ctx context.Context) {
a.logger.Info("received work", "job_count", workResp.Count)
// Process each job
// Process each job based on type and status
for _, job := range workResp.Jobs {
if job.Type == "Deployment" {
switch {
case job.Status == "AwaitingCSR":
// Agent keygen mode: generate key locally, create CSR, submit to server
a.executeCSRJob(ctx, job)
case job.Type == "Deployment":
a.executeDeploymentJob(ctx, job)
}
}
}
// executeCSRJob handles an AwaitingCSR job: generates a private key locally, creates a CSR,
// and submits it to the control plane for signing. The private key is stored on the local
// filesystem with 0600 permissions and NEVER sent to the server.
//
// Flow:
// 1. Generate ECDSA P-256 key pair
// 2. Store private key to disk (keyDir/certID.key) with 0600 permissions
// 3. Create CSR with common name and SANs from work response
// 4. Submit CSR to control plane via POST /agents/{id}/csr
// 5. Server signs the CSR and creates a cert version + deployment jobs
func (a *Agent) executeCSRJob(ctx context.Context, job JobItem) {
a.logger.Info("executing CSR job (agent-side key generation)",
"job_id", job.ID,
"certificate_id", job.CertificateID,
"common_name", job.CommonName)
// Step 1: Generate ECDSA P-256 key pair
privKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
a.logger.Error("failed to generate private key",
"job_id", job.ID,
"error", err)
_ = a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key generation failed: %v", err))
return
}
a.logger.Info("generated ECDSA P-256 key pair locally",
"job_id", job.ID,
"certificate_id", job.CertificateID)
// Step 2: Store private key to disk with secure permissions
keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
privKeyDER, err := x509.MarshalECPrivateKey(privKey)
if err != nil {
a.logger.Error("failed to marshal private key",
"job_id", job.ID,
"error", err)
_ = a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key marshal failed: %v", err))
return
}
privKeyPEM := pem.EncodeToMemory(&pem.Block{
Type: "EC PRIVATE KEY",
Bytes: privKeyDER,
})
if err := os.WriteFile(keyPath, privKeyPEM, 0600); err != nil {
a.logger.Error("failed to write private key to disk",
"job_id", job.ID,
"key_path", keyPath,
"error", err)
_ = a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key storage failed: %v", err))
return
}
a.logger.Info("private key stored securely",
"job_id", job.ID,
"key_path", keyPath,
"permissions", "0600")
// Step 3: Create CSR with common name and SANs
csrTemplate := &x509.CertificateRequest{
Subject: pkix.Name{
CommonName: job.CommonName,
},
DNSNames: job.SANs,
}
csrDER, err := x509.CreateCertificateRequest(rand.Reader, csrTemplate, privKey)
if err != nil {
a.logger.Error("failed to create CSR",
"job_id", job.ID,
"error", err)
_ = a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR creation failed: %v", err))
return
}
csrPEM := string(pem.EncodeToMemory(&pem.Block{
Type: "CERTIFICATE REQUEST",
Bytes: csrDER,
}))
// Step 4: Submit CSR to the control plane (only the public key leaves the agent)
a.logger.Info("submitting CSR to control plane",
"job_id", job.ID,
"certificate_id", job.CertificateID)
submitPath := fmt.Sprintf("/api/v1/agents/%s/csr", a.config.AgentID)
resp, err := a.makeRequest(ctx, http.MethodPost, submitPath, map[string]string{
"csr_pem": csrPEM,
"certificate_id": job.CertificateID,
})
if err != nil {
a.logger.Error("failed to submit CSR",
"job_id", job.ID,
"error", err)
_ = a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR submission failed: %v", err))
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusAccepted {
body, _ := io.ReadAll(resp.Body)
a.logger.Error("CSR submission rejected",
"job_id", job.ID,
"status", resp.StatusCode,
"body", string(body))
_ = a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR rejected: %s", string(body)))
return
}
a.logger.Info("CSR submitted and signed successfully",
"job_id", job.ID,
"certificate_id", job.CertificateID,
"key_path", keyPath)
}
// executeDeploymentJob executes a deployment job by fetching the certificate and deploying it
// to the target system using the appropriate connector (NGINX, F5 BIG-IP, or IIS).
//
// For agent keygen mode, the private key is read from the local key store (keyDir/certID.key)
// rather than fetched from the server. The deployment includes the locally-held key.
//
// Flow:
// 1. Report job as Running
// 2. Fetch the certificate PEM from the control plane
// 3. Instantiate the target connector based on target_type from the work response
// 4. Call DeployCertificate on the connector
// 5. Report job as Completed (or Failed)
// 3. Load local private key if it exists (agent keygen mode)
// 4. Instantiate the target connector based on target_type from the work response
// 5. Call DeployCertificate on the connector
// 6. Report job as Completed (or Failed)
func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
a.logger.Info("executing deployment job",
"job_id", job.ID,
@@ -210,6 +354,16 @@ func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
// Split PEM into cert and chain (separated by double newline between PEM blocks)
certOnly, chainPEM := splitPEMChain(certPEM)
// Check for locally-stored private key (agent keygen mode)
keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
var keyPEM string
if keyData, err := os.ReadFile(keyPath); err == nil {
keyPEM = string(keyData)
a.logger.Info("loaded local private key for deployment",
"job_id", job.ID,
"key_path", keyPath)
}
// Deploy to the target using the appropriate connector
if job.TargetType != "" {
connector, err := a.createTargetConnector(job.TargetType, job.TargetConfig)
@@ -224,6 +378,7 @@ func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
deployReq := target.DeploymentRequest{
CertPEM: certOnly,
KeyPEM: keyPEM,
ChainPEM: chainPEM,
TargetConfig: job.TargetConfig,
Metadata: map[string]string{
@@ -418,6 +573,7 @@ func main() {
apiKey := flag.String("api-key", getEnvDefault("CERTCTL_API_KEY", ""), "Agent API key")
agentName := flag.String("name", getEnvDefault("CERTCTL_AGENT_NAME", "certctl-agent"), "Agent name")
agentID := flag.String("agent-id", getEnvDefault("CERTCTL_AGENT_ID", ""), "Agent ID (from registration)")
keyDir := flag.String("key-dir", getEnvDefault("CERTCTL_KEY_DIR", "/var/lib/certctl/keys"), "Directory for storing private keys")
flag.Parse()
if *apiKey == "" {
@@ -453,6 +609,7 @@ func main() {
AgentName: *agentName,
AgentID: *agentID,
Hostname: hostname,
KeyDir: *keyDir,
}
// Create and start agent
+8 -2
View File
@@ -102,10 +102,10 @@ func main() {
policyService := service.NewPolicyService(policyRepo, auditService)
certificateService := service.NewCertificateService(certificateRepo, policyService, auditService)
notificationService := service.NewNotificationService(notificationRepo, make(map[string]service.Notifier))
renewalService := service.NewRenewalService(certificateRepo, jobRepo, renewalPolicyRepo, auditService, notificationService, issuerRegistry)
renewalService := service.NewRenewalService(certificateRepo, jobRepo, renewalPolicyRepo, auditService, notificationService, issuerRegistry, cfg.Keygen.Mode)
deploymentService := service.NewDeploymentService(jobRepo, targetRepo, agentRepo, certificateRepo, auditService, notificationService)
jobService := service.NewJobService(jobRepo, renewalService, deploymentService, logger)
agentService := service.NewAgentService(agentRepo, certificateRepo, jobRepo, targetRepo, auditService, issuerRegistry)
agentService := service.NewAgentService(agentRepo, certificateRepo, jobRepo, targetRepo, auditService, issuerRegistry, renewalService)
issuerService := service.NewIssuerService(issuerRepo, auditService)
targetService := service.NewTargetService(targetRepo, auditService)
teamService := service.NewTeamService(teamRepo, auditService)
@@ -208,6 +208,12 @@ func main() {
logger.Info("authentication enabled", "type", cfg.Auth.Type)
}
if cfg.Keygen.Mode == "server" {
logger.Warn("server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only")
} else {
logger.Info("agent-side key generation enabled — private keys never leave agent infrastructure")
}
// Apply middleware to API router
apiHandler := middleware.Chain(apiRouter, middlewareStack...)
+1
View File
@@ -38,6 +38,7 @@ services:
CERTCTL_SERVER_PORT: 8443
CERTCTL_LOG_LEVEL: info
CERTCTL_AUTH_TYPE: none
CERTCTL_KEYGEN_MODE: server # Demo uses server-side keygen; production should use "agent"
ports:
- "8443:8443"
networks:
+55 -51
View File
@@ -8,7 +8,7 @@ New to certificates? Read the [Concepts Guide](concepts.md) first.
### Design Principles
1. **Private Key Isolation** — Agents generate keys locally and submit CSRs only. Private keys never touch the control plane. (M8 milestone; currently server-side keygen for Local CA demo, flagged explicitly.)
1. **Private Key Isolation** — Agents generate ECDSA P-256 keys locally and submit CSRs only. Private keys never touch the control plane. Server-side keygen available via `CERTCTL_KEYGEN_MODE=server` for demo only.
2. **GUI as Primary Interface** — The web dashboard is the operational control plane, not a secondary viewer. Every backend feature ships with its corresponding GUI surface.
3. **Decoupled Operations** — Agents operate autonomously; the control plane coordinates but doesn't block agent function
4. **Audit-First** — Complete traceability of all issuance, deployment, and rotation events
@@ -74,9 +74,9 @@ The server exposes a REST API under `/api/v1/` and optionally serves the web das
### Agents
Lightweight Go processes that run on or near your infrastructure. Agents poll the control plane for pending deployment jobs, fetch signed certificates, deploy them to target systems, and report job status back. In M8, agents will also generate private keys locally and create CSRs — private keys will never leave agent infrastructure. Agents communicate with the control plane via HTTP and authenticate with API keys.
Lightweight Go processes that run on or near your infrastructure. Agents generate ECDSA P-256 private keys locally, create CSRs, and submit them to the control plane for signing — private keys never leave agent infrastructure. Agents also handle certificate deployment to target systems (NGINX, F5, IIS) and report job status. They communicate with the control plane via HTTP and authenticate with API keys.
The agent runs two background loops: a heartbeat (every 60 seconds) to signal it's alive, and a work poll (every 30 seconds) to check for pending deployment jobs via `GET /api/v1/agents/{id}/work`. When a job is found, the agent fetches the certificate, executes the deployment, and reports status via `POST /api/v1/agents/{id}/jobs/{job_id}/status`.
The agent runs two background loops: a heartbeat (every 60 seconds) to signal it's alive, and a work poll (every 30 seconds) to check for actionable jobs via `GET /api/v1/agents/{id}/work`. Jobs may be `AwaitingCSR` (agent needs to generate key + submit CSR) or `Deployment` (agent needs to deploy a certificate). Private keys are stored in `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`) with 0600 permissions.
### Web Dashboard
@@ -234,58 +234,62 @@ sequenceDiagram
### 2. Certificate Issuance
#### V1: Server-Side Key Generation (Local CA)
#### Agent-Side Key Generation (Default)
In V1, the control plane generates keys and CSRs server-side for the Local CA. This simplifies the initial implementation — the full agent-side key generation flow is planned for M8.
In the default `agent` keygen mode (`CERTCTL_KEYGEN_MODE=agent`), the control plane never touches private keys. When a renewal or issuance job is created, it enters `AwaitingCSR` state. The agent picks it up, generates an ECDSA P-256 key pair locally, and submits only the CSR (public key).
```mermaid
sequenceDiagram
participant S as Scheduler
participant SVC as RenewalService
participant DB as PostgreSQL
participant A as Agent
participant API as Control Plane API
participant ISS as Issuer Connector
S->>SVC: ProcessRenewalJob(job)
SVC->>DB: UPDATE job SET status='AwaitingCSR'
SVC->>DB: UPDATE cert SET status='RenewalInProgress'
A->>API: GET /agents/{id}/work
API-->>A: [{id, type:"Renewal", status:"AwaitingCSR", common_name, sans}]
A->>A: Generate ECDSA P-256 key pair
A->>A: Store key to CERTCTL_KEY_DIR/certId.key (0600)
A->>A: Create CSR with CN + SANs
A->>API: POST /agents/{id}/csr<br/>{csr_pem, certificate_id}
API->>SVC: CompleteAgentCSRRenewal(job, cert, csrPEM)
SVC->>ISS: RenewCertificate(CN, SANs, csrPEM)
ISS-->>SVC: IssuanceResult{cert_pem, chain_pem, serial}
SVC->>DB: INSERT INTO certificate_versions (PEM chain + CSR only)
SVC->>DB: UPDATE cert SET status='Active', expires_at
SVC->>DB: CREATE deployment jobs for targets
Note over A: Agent deploys using locally-held private key
```
#### Server-Side Key Generation (Demo Only)
Set `CERTCTL_KEYGEN_MODE=server` for development/demo with Local CA. The control plane generates RSA-2048 keys server-side. A log warning is emitted at startup.
```mermaid
sequenceDiagram
participant U as User / Scheduler
participant API as Control Plane API
participant SVC as RenewalService
participant ISS as IssuerConnector
participant DB as PostgreSQL
U->>API: POST /api/v1/certificates/{id}/renew
API->>SVC: ProcessRenewalJob(job)
U->>SVC: ProcessRenewalJob(job)
SVC->>SVC: Generate RSA-2048 key pair (server-side)
SVC->>SVC: Create CSR with CN + SANs
SVC->>ISS: IssueCertificate(commonName, sans, csrPEM)
ISS-->>SVC: IssuanceResult{cert_pem, chain_pem, serial, not_after}
SVC->>ISS: RenewCertificate(CN, SANs, csrPEM)
ISS-->>SVC: IssuanceResult{cert_pem, chain_pem, serial}
SVC->>DB: INSERT INTO certificate_versions (PEM + private key)
SVC->>DB: UPDATE cert SET status='Active'
SVC->>DB: CREATE deployment jobs
SVC->>SVC: Compute SHA-256 fingerprint
SVC->>DB: INSERT INTO certificate_versions (PEM chain + CSR)
SVC->>DB: UPDATE managed_certificates SET status='Active', expires_at
SVC->>DB: INSERT INTO audit_events
SVC->>DB: CREATE deployment jobs for all mapped targets
Note over SVC: Deployment jobs picked up by agents<br/>via GET /api/v1/agents/{id}/work
```
#### M8 (Planned): Agent-Side Key Generation
```mermaid
sequenceDiagram
participant A as Agent
participant API as Control Plane API
participant ISS as Issuer Connector
participant DB as PostgreSQL
A->>A: Generate RSA-2048 key pair locally
A->>A: Create CSR (CN + SANs, public key only)
A->>API: POST /api/v1/agents/{id}/csr<br/>{csr_pem, certificate_id}
API->>ISS: IssueCertificate(IssuanceRequest{CSR})
ISS-->>API: IssuanceResult{cert_pem, chain_pem, serial, not_after}
API->>DB: INSERT INTO certificate_versions
API->>DB: UPDATE managed_certificates SET status='Active'
API-->>A: {certificate_pem, chain_pem}<br/>(NO private key in response)
A->>A: Store cert + chain locally (key never leaves agent)
A->>A: Deploy to target system
Note over SVC: WARNING: Private keys touch control plane
```
### 3. Deploy Certificate to Target
@@ -296,7 +300,7 @@ The agent deploys certificates using target connectors. Each connector knows how
- **F5 BIG-IP**: Calls the F5 REST API to upload certificate and update virtual server bindings
- **IIS**: Uses WinRM to import the certificate into the Windows certificate store and bind it to an IIS site
The agent handles both the certificate (public) and the private key (local only). The control plane never sees the private key.
The agent handles both the certificate (public) and the private key (read from local key store at `CERTCTL_KEY_DIR`). The control plane never sees the private key.
### 4. Automatic Renewal
@@ -455,16 +459,16 @@ flowchart LR
style ROT fill:#efe,stroke:#3c3
```
**V1 (Current):** The Local CA issuer generates RSA-2048 keys and CSRs server-side within `RenewalService.ProcessRenewalJob`. Private key material is stored alongside the CSR in the `certificate_versions` table. This is a pragmatic V1 trade-off to get the end-to-end lifecycle working.
**Agent keygen mode (default, `CERTCTL_KEYGEN_MODE=agent`):** Private keys follow a strict lifecycle on agents:
**M8 (Target Architecture):** Private keys follow a strict lifecycle on agents:
1. **Generated on the agent** — ECDSA P-256, never sent to the control plane
2. **Stored on the agent**`CERTCTL_KEY_DIR` with file permissions 0600
3. **Used by the agent** — for deployment to targets (via `DeploymentRequest.KeyPEM`)
4. **Rotated by the agent** — old keys overwritten after successful renewal
1. **Generated on the agent** — never sent to the control plane
2. **Stored on the agent** — file permissions 0600, owned by the agent process user
3. **Used by the agent** — for deployment to targets and CSR generation
4. **Rotated by the agent** — old keys deleted after successful renewal
The control plane only handles public material: certificates, chains, and CSRs.
The M8+ architecture ensures the control plane only handles public material: certificates, chains, and CSRs.
**Server keygen mode (`CERTCTL_KEYGEN_MODE=server`, demo only):** The control plane generates RSA-2048 keys server-side within `processRenewalServerKeygen`. Private keys are stored in `certificate_versions.csr_pem`. A log warning is emitted at startup. Use only for Local CA development/demo.
### Authentication
+21
View File
@@ -18,6 +18,15 @@ type Config struct {
Auth AuthConfig
RateLimit RateLimitConfig
CORS CORSConfig
Keygen KeygenConfig
}
// KeygenConfig controls where private keys are generated.
type KeygenConfig struct {
// Mode: "agent" (default, production) or "server" (demo only, Local CA).
// In "agent" mode, renewal/issuance jobs enter AwaitingCSR state and agents generate keys locally.
// In "server" mode, the control plane generates keys (private keys touch the server — demo only).
Mode string
}
// ServerConfig contains HTTP server configuration.
@@ -101,6 +110,9 @@ func Load() (*Config, error) {
CORS: CORSConfig{
AllowedOrigins: getEnvList("CERTCTL_CORS_ORIGINS", nil),
},
Keygen: KeygenConfig{
Mode: getEnv("CERTCTL_KEYGEN_MODE", "agent"),
},
}
if err := cfg.Validate(); err != nil {
@@ -161,6 +173,15 @@ func (c *Config) Validate() error {
return fmt.Errorf("auth secret is required for auth type %s", c.Auth.Type)
}
// Validate keygen mode
validKeygenModes := map[string]bool{
"agent": true,
"server": true,
}
if !validKeygenModes[c.Keygen.Mode] {
return fmt.Errorf("invalid keygen mode: %s (must be 'agent' or 'server')", c.Keygen.Mode)
}
// Validate scheduler intervals
if c.Scheduler.RenewalCheckInterval < 1*time.Minute {
return fmt.Errorf("renewal check interval must be at least 1 minute")
+3 -1
View File
@@ -20,9 +20,11 @@ type Connector interface {
}
// DeploymentRequest contains the parameters for deploying a certificate to a target.
// Note: This request NEVER contains a private key. The agent generates keys locally.
// In agent keygen mode, KeyPEM is populated from the agent's local key store.
// In server keygen mode (demo only), KeyPEM may be empty if the key was embedded in the cert version.
type DeploymentRequest struct {
CertPEM string `json:"cert_pem"`
KeyPEM string `json:"key_pem,omitempty"`
ChainPEM string `json:"chain_pem"`
TargetConfig json.RawMessage `json:"target_config"`
Metadata map[string]string `json:"metadata,omitempty"`
+8 -5
View File
@@ -35,11 +35,12 @@ const (
type JobStatus string
const (
JobStatusPending JobStatus = "Pending"
JobStatusRunning JobStatus = "Running"
JobStatusCompleted JobStatus = "Completed"
JobStatusFailed JobStatus = "Failed"
JobStatusCancelled JobStatus = "Cancelled"
JobStatusPending JobStatus = "Pending"
JobStatusAwaitingCSR JobStatus = "AwaitingCSR"
JobStatusRunning JobStatus = "Running"
JobStatusCompleted JobStatus = "Completed"
JobStatusFailed JobStatus = "Failed"
JobStatusCancelled JobStatus = "Cancelled"
)
// DeploymentJob represents a job that deploys a certificate to a target via an agent.
@@ -55,6 +56,8 @@ type WorkItem struct {
ID string `json:"id"`
Type JobType `json:"type"`
CertificateID string `json:"certificate_id"`
CommonName string `json:"common_name,omitempty"`
SANs []string `json:"sans,omitempty"`
TargetID *string `json:"target_id,omitempty"`
TargetType string `json:"target_type,omitempty"`
TargetConfig json.RawMessage `json:"target_config,omitempty"`
+2 -2
View File
@@ -52,10 +52,10 @@ func TestCertificateLifecycle(t *testing.T) {
policyService := service.NewPolicyService(policyRepo, auditService)
certificateService := service.NewCertificateService(certRepo, policyService, auditService)
notificationService := service.NewNotificationService(notifRepo, make(map[string]service.Notifier))
renewalService := service.NewRenewalService(certRepo, jobRepo, renewalPolicyRepo, auditService, notificationService, issuerRegistry)
renewalService := service.NewRenewalService(certRepo, jobRepo, renewalPolicyRepo, auditService, notificationService, issuerRegistry, "server")
deploymentService := service.NewDeploymentService(jobRepo, targetRepo, agentRepo, certRepo, auditService, notificationService)
jobService := service.NewJobService(jobRepo, renewalService, deploymentService, logger)
agentService := service.NewAgentService(agentRepo, certRepo, jobRepo, targetRepo, auditService, issuerRegistry)
agentService := service.NewAgentService(agentRepo, certRepo, jobRepo, targetRepo, auditService, issuerRegistry, renewalService)
issuerService := service.NewIssuerService(issuerRepo, auditService)
// Initialize handlers
+70 -32
View File
@@ -20,6 +20,7 @@ type AgentService struct {
targetRepo repository.TargetRepository
auditService *AuditService
issuerRegistry map[string]IssuerConnector
renewalService *RenewalService
}
// NewAgentService creates a new agent service.
@@ -30,6 +31,7 @@ func NewAgentService(
targetRepo repository.TargetRepository,
auditService *AuditService,
issuerRegistry map[string]IssuerConnector,
renewalService *RenewalService,
) *AgentService {
return &AgentService{
agentRepo: agentRepo,
@@ -38,6 +40,7 @@ func NewAgentService(
targetRepo: targetRepo,
auditService: auditService,
issuerRegistry: issuerRegistry,
renewalService: renewalService,
}
}
@@ -106,8 +109,9 @@ func (s *AgentService) Heartbeat(agentID string) error {
}
// SubmitCSR validates and processes a Certificate Signing Request from an agent.
// It forwards the CSR to the appropriate issuer connector for signing, then stores
// the resulting certificate version.
// In agent keygen mode, this completes an AwaitingCSR renewal job by signing the CSR
// and storing the cert version. The private key stays on the agent — only the CSR
// (public key) reaches the server.
func (s *AgentService) SubmitCSR(ctx context.Context, agentID string, certID string, csrPEM []byte) error {
// Fetch agent
agent, err := s.agentRepo.Get(ctx, agentID)
@@ -120,39 +124,57 @@ func (s *AgentService) SubmitCSR(ctx context.Context, agentID string, certID str
return fmt.Errorf("invalid CSR: empty")
}
// If a certificate ID is provided, sign the CSR via the issuer connector
if certID != "" {
cert, err := s.certRepo.Get(ctx, certID)
if err != nil {
return fmt.Errorf("failed to fetch certificate: %w", err)
}
// Look up the issuer connector
// Check for AwaitingCSR jobs first (agent keygen mode)
if s.renewalService != nil {
awaitingJobs, err := s.renewalService.GetAwaitingCSRJobs(ctx, certID)
if err == nil && len(awaitingJobs) > 0 {
// Complete the renewal via the renewal service (signs CSR, stores version, creates deploy jobs)
if err := s.renewalService.CompleteAgentCSRRenewal(ctx, awaitingJobs[0], cert, string(csrPEM)); err != nil {
return fmt.Errorf("failed to complete agent CSR renewal: %w", err)
}
// Record audit event
_ = s.auditService.RecordEvent(ctx, agent.ID, domain.ActorTypeAgent,
"csr_submitted", "certificate", certID,
map[string]interface{}{
"agent_hostname": agent.Hostname,
"keygen_mode": "agent",
"job_id": awaitingJobs[0].ID,
})
return nil
}
}
// Fallback: direct issuer signing (no AwaitingCSR job — ad-hoc CSR submission)
connector, ok := s.issuerRegistry[cert.IssuerID]
if ok {
// Sign the CSR via the issuer connector
result, err := connector.IssueCertificate(ctx, cert.CommonName, cert.SANs, string(csrPEM))
if err != nil {
return fmt.Errorf("issuer signing failed: %w", err)
}
// Store the signed certificate as a new version
version := &domain.CertificateVersion{
ID: generateID("certver"),
CertificateID: certID,
SerialNumber: result.Serial,
NotBefore: result.NotBefore,
NotAfter: result.NotAfter,
PEMChain: result.CertPEM + "\n" + result.ChainPEM,
CSRPEM: string(csrPEM),
CreatedAt: time.Now(),
ID: generateID("certver"),
CertificateID: certID,
SerialNumber: result.Serial,
NotBefore: result.NotBefore,
NotAfter: result.NotAfter,
PEMChain: result.CertPEM + "\n" + result.ChainPEM,
CSRPEM: string(csrPEM),
CreatedAt: time.Now(),
}
if err := s.certRepo.CreateVersion(ctx, version); err != nil {
return fmt.Errorf("failed to store certificate version: %w", err)
}
// Update certificate status and expiry
cert.Status = domain.CertificateStatusActive
cert.ExpiresAt = result.NotAfter
now := time.Now()
@@ -165,11 +187,9 @@ func (s *AgentService) SubmitCSR(ctx context.Context, agentID string, certID str
}
// Record audit event
if err := s.auditService.RecordEvent(ctx, agent.ID, domain.ActorTypeAgent,
_ = s.auditService.RecordEvent(ctx, agent.ID, domain.ActorTypeAgent,
"csr_submitted", "certificate", certID,
map[string]interface{}{"agent_hostname": agent.Hostname}); err != nil {
fmt.Printf("failed to record audit event: %v\n", err)
}
map[string]interface{}{"agent_hostname": agent.Hostname})
return nil
}
@@ -210,7 +230,8 @@ func (s *AgentService) GetCertificateForAgent(ctx context.Context, agentID strin
return []byte(latestVersion.PEMChain), nil
}
// GetPendingWork returns deployment jobs assigned to an agent.
// GetPendingWork returns actionable jobs for an agent: deployment jobs (Pending) and
// renewal/issuance jobs awaiting CSR submission (AwaitingCSR).
func (s *AgentService) GetPendingWork(ctx context.Context, agentID string) ([]*domain.Job, error) {
// Fetch agent to verify it exists
_, err := s.agentRepo.Get(ctx, agentID)
@@ -218,23 +239,30 @@ func (s *AgentService) GetPendingWork(ctx context.Context, agentID string) ([]*d
return nil, fmt.Errorf("failed to fetch agent: %w", err)
}
// Get all deployment jobs
jobs, err := s.jobRepo.ListByStatus(ctx, domain.JobStatusPending)
var workForAgent []*domain.Job
// Get pending deployment jobs
pendingJobs, err := s.jobRepo.ListByStatus(ctx, domain.JobStatusPending)
if err != nil {
return nil, fmt.Errorf("failed to list pending jobs: %w", err)
}
var workForAgent []*domain.Job
// Filter to only jobs assigned to this agent
// Note: In this implementation, agents don't filter jobs by assignment
// All deployment jobs are returned for the agent to process
for _, job := range jobs {
for _, job := range pendingJobs {
if job.Type == domain.JobTypeDeployment {
workForAgent = append(workForAgent, job)
}
}
// Get AwaitingCSR jobs (agent keygen mode — agent needs to generate key + submit CSR)
awaitingJobs, err := s.jobRepo.ListByStatus(ctx, domain.JobStatusAwaitingCSR)
if err != nil {
return nil, fmt.Errorf("failed to list awaiting CSR jobs: %w", err)
}
for _, job := range awaitingJobs {
if job.Type == domain.JobTypeRenewal || job.Type == domain.JobTypeIssuance {
workForAgent = append(workForAgent, job)
}
}
return workForAgent, nil
}
@@ -381,8 +409,9 @@ func (s *AgentService) GetWork(agentID string) ([]domain.Job, error) {
return result, nil
}
// GetWorkWithTargets returns pending deployment jobs enriched with target type and config.
// This allows agents to know which connector to invoke for each deployment job.
// GetWorkWithTargets returns actionable jobs enriched with target/certificate details.
// Deployment jobs include target type + config. AwaitingCSR jobs include common name + SANs
// so the agent knows what CSR to generate.
func (s *AgentService) GetWorkWithTargets(agentID string) ([]domain.WorkItem, error) {
jobs, err := s.GetPendingWork(context.Background(), agentID)
if err != nil {
@@ -402,7 +431,7 @@ func (s *AgentService) GetWorkWithTargets(agentID string) ([]domain.WorkItem, er
Status: j.Status,
}
// Enrich with target details if target ID is present
// Enrich with target details for deployment jobs
if j.TargetID != nil && *j.TargetID != "" {
target, err := s.targetRepo.Get(context.Background(), *j.TargetID)
if err == nil {
@@ -411,6 +440,15 @@ func (s *AgentService) GetWorkWithTargets(agentID string) ([]domain.WorkItem, er
}
}
// Enrich with certificate details for AwaitingCSR jobs (agent needs CN + SANs for CSR)
if j.Status == domain.JobStatusAwaitingCSR {
cert, err := s.certRepo.Get(context.Background(), j.CertificateID)
if err == nil {
item.CommonName = cert.CommonName
item.SANs = cert.SANs
}
}
items = append(items, item)
}
+1 -1
View File
@@ -28,7 +28,7 @@ func newTestJobService(jobRepo *mockJobRepo) *JobService {
targetRepo := &mockTargetRepo{Targets: make(map[string]*domain.DeploymentTarget)}
agentRepo := &mockAgentRepo{Agents: make(map[string]*domain.Agent)}
renewalService := NewRenewalService(certRepo, jobRepo, renewalPolicyRepo, auditService, notifService, make(map[string]IssuerConnector))
renewalService := NewRenewalService(certRepo, jobRepo, renewalPolicyRepo, auditService, notifService, make(map[string]IssuerConnector), "server")
deploymentService := NewDeploymentService(jobRepo, targetRepo, agentRepo, certRepo, auditService, notifService)
return NewJobService(jobRepo, renewalService, deploymentService, logger)
+179 -37
View File
@@ -24,6 +24,7 @@ type RenewalService struct {
auditService *AuditService
notificationSvc *NotificationService
issuerRegistry map[string]IssuerConnector
keygenMode string // "agent" (default) or "server" (demo only)
}
// IssuerConnector defines the service-layer interface for interacting with certificate issuers.
@@ -53,7 +54,11 @@ func NewRenewalService(
auditService *AuditService,
notificationSvc *NotificationService,
issuerRegistry map[string]IssuerConnector,
keygenMode string,
) *RenewalService {
if keygenMode == "" {
keygenMode = "agent"
}
return &RenewalService{
certRepo: certRepo,
jobRepo: jobRepo,
@@ -61,6 +66,7 @@ func NewRenewalService(
auditService: auditService,
notificationSvc: notificationSvc,
issuerRegistry: issuerRegistry,
keygenMode: keygenMode,
}
}
@@ -232,13 +238,13 @@ func (s *RenewalService) updateCertExpiryStatus(ctx context.Context, cert *domai
}
}
// ProcessRenewalJob executes a renewal job: generate CSR, call issuer, store new version,
// update cert status, and create deployment jobs for targets.
// ProcessRenewalJob executes a renewal job. Behavior depends on keygen mode:
//
// V1 Architecture Note: For the Local CA issuer, the control plane generates a server-side
// ephemeral key + CSR. The private key is stored in the CertificateVersion.CSRPEM field
// so agents can retrieve it for deployment. In V2+ with ACME/external CAs, agents will
// generate keys locally and submit CSRs, so private keys never leave the target infrastructure.
// Agent mode (default, production): Sets job to AwaitingCSR. The agent generates keys
// locally, submits a CSR, and the server signs it. Private keys never leave the agent.
//
// Server mode (demo only, Local CA): Server generates RSA key + CSR, signs via issuer,
// stores cert version with private key so agent can retrieve it for deployment.
func (s *RenewalService) ProcessRenewalJob(ctx context.Context, job *domain.Job) error {
// Update job status to in-progress
if err := s.jobRepo.UpdateStatus(ctx, job.ID, domain.JobStatusRunning, ""); err != nil {
@@ -259,14 +265,50 @@ func (s *RenewalService) ProcessRenewalJob(ctx context.Context, job *domain.Job)
return fmt.Errorf("certificate has no issuer assigned")
}
connector, ok := s.issuerRegistry[issuerID]
_, ok := s.issuerRegistry[issuerID]
if !ok {
s.failJob(ctx, job, fmt.Sprintf("issuer connector not found for %s", issuerID))
return fmt.Errorf("issuer connector not found for %s", issuerID)
}
// Generate server-side RSA key + CSR for this renewal
// V1: server generates ephemeral key for Local CA. V2+: agent generates key locally.
// Branch on keygen mode
if s.keygenMode == "agent" {
return s.processRenewalAgentKeygen(ctx, job, cert)
}
return s.processRenewalServerKeygen(ctx, job, cert)
}
// processRenewalAgentKeygen sets the job to AwaitingCSR so an agent can generate keys
// locally and submit a CSR. The server never touches the private key.
func (s *RenewalService) processRenewalAgentKeygen(ctx context.Context, job *domain.Job, cert *domain.ManagedCertificate) error {
// Transition job to AwaitingCSR — agent will pick this up during work polling
if err := s.jobRepo.UpdateStatus(ctx, job.ID, domain.JobStatusAwaitingCSR, ""); err != nil {
return fmt.Errorf("failed to set job to AwaitingCSR: %w", err)
}
// Update certificate status
cert.Status = domain.CertificateStatusRenewalInProgress
cert.UpdatedAt = time.Now()
if err := s.certRepo.Update(ctx, cert); err != nil {
fmt.Printf("failed to update cert status for %s: %v\n", cert.ID, err)
}
// Record audit event
_ = s.auditService.RecordEvent(ctx, "system", domain.ActorTypeSystem,
"renewal_awaiting_csr", "certificate", job.CertificateID,
map[string]interface{}{"job_id": job.ID, "keygen_mode": "agent"})
return nil
}
// processRenewalServerKeygen is the legacy server-side keygen flow for Local CA demo.
// The server generates an ephemeral RSA key + CSR, signs via issuer, and stores the
// private key in the cert version so agents can retrieve it for deployment.
// WARNING: Private keys touch the control plane. Use only for development/demo.
func (s *RenewalService) processRenewalServerKeygen(ctx context.Context, job *domain.Job, cert *domain.ManagedCertificate) error {
connector := s.issuerRegistry[cert.IssuerID]
// Generate server-side RSA key + CSR
privKey, err := rsa.GenerateKey(rand.Reader, 2048)
if err != nil {
s.failJob(ctx, job, fmt.Sprintf("key generation failed: %v", err))
@@ -291,7 +333,7 @@ func (s *RenewalService) ProcessRenewalJob(ctx context.Context, job *domain.Job)
Bytes: csrDER,
}))
// Encode private key to PEM for storage (V1: stored so agent can retrieve for deployment)
// Encode private key to PEM for storage (server mode: stored so agent can retrieve)
privKeyPEM := string(pem.EncodeToMemory(&pem.Block{
Type: "RSA PRIVATE KEY",
Bytes: x509.MarshalPKCS1PrivateKey(privKey),
@@ -301,15 +343,10 @@ func (s *RenewalService) ProcessRenewalJob(ctx context.Context, job *domain.Job)
result, err := connector.RenewCertificate(ctx, cert.CommonName, cert.SANs, csrPEM)
if err != nil {
s.failJob(ctx, job, fmt.Sprintf("issuer renewal failed: %v", err))
// Send failure notification
_ = s.notificationSvc.SendRenewalNotification(ctx, cert, false, err)
// Record audit event
_ = s.auditService.RecordEvent(ctx, "system", domain.ActorTypeSystem,
"renewal_job_failed", "certificate", job.CertificateID,
map[string]interface{}{"job_id": job.ID, "error": err.Error()})
return fmt.Errorf("issuer renewal failed: %w", err)
}
@@ -325,7 +362,7 @@ func (s *RenewalService) ProcessRenewalJob(ctx context.Context, job *domain.Job)
NotAfter: result.NotAfter,
FingerprintSHA256: fingerprint,
PEMChain: result.CertPEM + "\n" + result.ChainPEM,
CSRPEM: privKeyPEM, // V1: stores private key for agent deployment
CSRPEM: privKeyPEM, // Server mode: stores private key for agent deployment
CreatedAt: time.Now(),
}
@@ -351,24 +388,7 @@ func (s *RenewalService) ProcessRenewalJob(ctx context.Context, job *domain.Job)
}
// Create deployment jobs for each target
if len(cert.TargetIDs) > 0 {
for _, targetID := range cert.TargetIDs {
tid := targetID // capture loop variable
deployJob := &domain.Job{
ID: generateID("job"),
CertificateID: cert.ID,
Type: domain.JobTypeDeployment,
Status: domain.JobStatusPending,
TargetID: &tid,
MaxAttempts: 3,
ScheduledAt: time.Now(),
CreatedAt: time.Now(),
}
if err := s.jobRepo.Create(ctx, deployJob); err != nil {
fmt.Printf("failed to create deployment job for target %s: %v\n", targetID, err)
}
}
}
s.createDeploymentJobs(ctx, cert)
// Send success notification
if err := s.notificationSvc.SendRenewalNotification(ctx, cert, true, nil); err != nil {
@@ -379,14 +399,136 @@ func (s *RenewalService) ProcessRenewalJob(ctx context.Context, job *domain.Job)
_ = s.auditService.RecordEvent(ctx, "system", domain.ActorTypeSystem,
"renewal_job_completed", "certificate", job.CertificateID,
map[string]interface{}{
"job_id": job.ID,
"serial": result.Serial,
"not_after": result.NotAfter,
"job_id": job.ID,
"serial": result.Serial,
"not_after": result.NotAfter,
"keygen_mode": "server",
})
return nil
}
// CompleteAgentCSRRenewal is called when an agent submits a CSR for an AwaitingCSR job.
// It signs the CSR via the issuer connector, stores the cert version (without private key),
// completes the renewal job, and creates deployment jobs.
func (s *RenewalService) CompleteAgentCSRRenewal(ctx context.Context, job *domain.Job, cert *domain.ManagedCertificate, csrPEM string) error {
connector, ok := s.issuerRegistry[cert.IssuerID]
if !ok {
s.failJob(ctx, job, fmt.Sprintf("issuer connector not found for %s", cert.IssuerID))
return fmt.Errorf("issuer connector not found for %s", cert.IssuerID)
}
// Update job to running
if err := s.jobRepo.UpdateStatus(ctx, job.ID, domain.JobStatusRunning, ""); err != nil {
return fmt.Errorf("failed to update job status: %w", err)
}
// Sign the agent-submitted CSR via issuer
result, err := connector.RenewCertificate(ctx, cert.CommonName, cert.SANs, csrPEM)
if err != nil {
s.failJob(ctx, job, fmt.Sprintf("issuer signing failed: %v", err))
_ = s.notificationSvc.SendRenewalNotification(ctx, cert, false, err)
_ = s.auditService.RecordEvent(ctx, "system", domain.ActorTypeSystem,
"renewal_job_failed", "certificate", job.CertificateID,
map[string]interface{}{"job_id": job.ID, "error": err.Error()})
return fmt.Errorf("issuer signing failed: %w", err)
}
fingerprint := computeCertFingerprint(result.CertPEM)
// Store cert version — CSRPEM holds the actual CSR (not the private key!)
version := &domain.CertificateVersion{
ID: generateID("certver"),
CertificateID: cert.ID,
SerialNumber: result.Serial,
NotBefore: result.NotBefore,
NotAfter: result.NotAfter,
FingerprintSHA256: fingerprint,
PEMChain: result.CertPEM + "\n" + result.ChainPEM,
CSRPEM: csrPEM, // Agent mode: stores actual CSR, not private key
CreatedAt: time.Now(),
}
if err := s.certRepo.CreateVersion(ctx, version); err != nil {
s.failJob(ctx, job, fmt.Sprintf("version creation failed: %v", err))
return fmt.Errorf("failed to create certificate version: %w", err)
}
// Update certificate status and expiry
cert.Status = domain.CertificateStatusActive
cert.ExpiresAt = result.NotAfter
now := time.Now()
cert.LastRenewalAt = &now
cert.UpdatedAt = now
if err := s.certRepo.Update(ctx, cert); err != nil {
s.failJob(ctx, job, fmt.Sprintf("cert update failed: %v", err))
return fmt.Errorf("failed to update certificate: %w", err)
}
// Mark job completed
if err := s.jobRepo.UpdateStatus(ctx, job.ID, domain.JobStatusCompleted, ""); err != nil {
return fmt.Errorf("failed to update job status: %w", err)
}
// Create deployment jobs for each target
s.createDeploymentJobs(ctx, cert)
// Send success notification
if err := s.notificationSvc.SendRenewalNotification(ctx, cert, true, nil); err != nil {
fmt.Printf("failed to send renewal notification: %v\n", err)
}
// Record audit event
_ = s.auditService.RecordEvent(ctx, "system", domain.ActorTypeSystem,
"renewal_job_completed", "certificate", cert.ID,
map[string]interface{}{
"job_id": job.ID,
"serial": result.Serial,
"not_after": result.NotAfter,
"keygen_mode": "agent",
})
return nil
}
// createDeploymentJobs creates pending deployment jobs for each target associated with a cert.
func (s *RenewalService) createDeploymentJobs(ctx context.Context, cert *domain.ManagedCertificate) {
if len(cert.TargetIDs) == 0 {
return
}
for _, targetID := range cert.TargetIDs {
tid := targetID
deployJob := &domain.Job{
ID: generateID("job"),
CertificateID: cert.ID,
Type: domain.JobTypeDeployment,
Status: domain.JobStatusPending,
TargetID: &tid,
MaxAttempts: 3,
ScheduledAt: time.Now(),
CreatedAt: time.Now(),
}
if err := s.jobRepo.Create(ctx, deployJob); err != nil {
fmt.Printf("failed to create deployment job for target %s: %v\n", targetID, err)
}
}
}
// GetAwaitingCSRJobs returns all jobs in AwaitingCSR state for a given certificate.
func (s *RenewalService) GetAwaitingCSRJobs(ctx context.Context, certID string) ([]*domain.Job, error) {
jobs, err := s.jobRepo.ListByCertificate(ctx, certID)
if err != nil {
return nil, err
}
var awaiting []*domain.Job
for _, j := range jobs {
if j.Status == domain.JobStatusAwaitingCSR {
awaiting = append(awaiting, j)
}
}
return awaiting, nil
}
// failJob is a helper to mark a job as failed with an error message.
func (s *RenewalService) failJob(ctx context.Context, job *domain.Job, errMsg string) {
if updateErr := s.jobRepo.UpdateStatus(ctx, job.ID, domain.JobStatusFailed, errMsg); updateErr != nil {
+12 -12
View File
@@ -30,7 +30,7 @@ func TestCheckExpiringCertificates_SendsThresholdAlerts(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create a cert expiring in 10 days
cert := &domain.ManagedCertificate{
@@ -112,7 +112,7 @@ func TestCheckExpiringCertificates_DeduplicatesAlerts(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create cert
cert := &domain.ManagedCertificate{
@@ -192,7 +192,7 @@ func TestCheckExpiringCertificates_SkipsRenewalInProgress(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create cert with RenewalInProgress status
cert := &domain.ManagedCertificate{
@@ -257,7 +257,7 @@ func TestCheckExpiringCertificates_UpdatesStatusToExpiring(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create active cert that will become expiring
// Use an issuer NOT in the registry so no renewal job is created (which would override status)
@@ -319,7 +319,7 @@ func TestCheckExpiringCertificates_UpdatesStatusToExpired(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create cert that is already expired
// Use an issuer NOT in the registry so no renewal job is created (which would override status)
@@ -381,7 +381,7 @@ func TestCheckExpiringCertificates_CreatesRenewalJob(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create expiring cert with registered issuer
cert := &domain.ManagedCertificate{
@@ -447,7 +447,7 @@ func TestCheckExpiringCertificates_SkipsWithoutIssuer(t *testing.T) {
// Empty issuer registry
issuerRegistry := map[string]IssuerConnector{}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create cert with unregistered issuer
cert := &domain.ManagedCertificate{
@@ -509,7 +509,7 @@ func TestCheckExpiringCertificates_SkipsDuplicateJobs(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create cert
cert := &domain.ManagedCertificate{
@@ -593,7 +593,7 @@ func TestProcessRenewalJob(t *testing.T) {
"iss-test": issuerConnector,
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create certificate
cert := &domain.ManagedCertificate{
@@ -689,7 +689,7 @@ func TestProcessRenewalJob_IssuerFailure(t *testing.T) {
"iss-test": issuerConnector,
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create certificate
cert := &domain.ManagedCertificate{
@@ -771,7 +771,7 @@ func TestRetryFailedJobs(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create failed job with attempts < max_attempts
failedJob := &domain.Job{
@@ -836,7 +836,7 @@ func TestProcessRenewalJob_NoCertificate(t *testing.T) {
"iss-test": &mockIssuerConnector{},
}
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry)
svc := NewRenewalService(certRepo, jobRepo, policyRepo, auditSvc, notifSvc, issuerRegistry, "server")
// Create job with non-existent certificate
job := &domain.Job{