fix(agent,service): SEC-002 — validate certificate_id shape + contain key path

Sprint 1 unified-master-audit closure. Pre-fix the agent built its
on-disk key path via:

  keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")

migrations/000001_initial_schema.up.sql declares managed_certificates.id
as TEXT PRIMARY KEY with no shape constraint, so a compromised control
plane (or a poisoned database row) could deliver a job whose
certificate_id is '../../etc/passwd', '/absolute/path', a NUL-byte
payload, or a Windows-separator-laden string — driving arbitrary
file write or read on the agent host.

Fix (two ends; both load-bearing):

Server side:
  - New internal/validation/certificate_id.go: ValidateCertificateID
    pins the canonical TEXT-PK shape (^[A-Za-z0-9._-]{1,128}$, plus
    explicit '.'/'..' rejection).
  - CertificateService.Create now invokes ValidateCertificateID after
    the existing required-fields check; malformed IDs are refused
    before persistence or downstream job creation.

Agent side:
  - cmd/agent/keymem.go: validateAgentCertID mirrors the server-side
    shape regex. safeAgentKeyPath additionally asserts the joined
    path is contained within KeyDir via filepath.Rel — even if a
    future refactor bypasses the shape check, a path that escapes
    KeyDir fails closed.
  - poll.go + deploy.go: both filepath.Join call sites routed
    through safeAgentKeyPath; rejection surfaces via reportJobStatus
    so the control plane sees the failure.

Regression coverage:
  - internal/validation/certificate_id_test.go: production shapes
    accepted; explicit rejection table for empty, overlong, posix
    traversal, absolute, Windows traversal, Windows separator, NUL
    byte, newline/tab injection, drive prefix, space, unicode dots.
  - cmd/agent/keymem_test.go: validateAgentCertID acceptance +
    rejection tables; safeAgentKeyPath happy path + the 8 audit
    vectors plus empty-keyDir refusal.

Closes SEC-002.
This commit is contained in:
shankar0123
2026-05-16 03:31:59 +00:00
parent e6cfd756ac
commit 037dab7b6f
7 changed files with 381 additions and 4 deletions
+14 -1
View File
@@ -151,7 +151,20 @@ func (a *Agent) executeCSRJob(ctx context.Context, job JobItem) {
// before any write touches disk. Also defer-clear the PEM buffer for
// the same reason — the encoded key isn't sensitive in transit (it's
// going to disk) but lingers on the heap if we don't.
keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
//
// SEC-002 closure (Sprint 1, 2026-05-16): safeAgentKeyPath validates
// the certificate_id shape AND asserts the joined path is contained
// within a.config.KeyDir. A crafted certificate_id like
// "../../etc/passwd" or "/abs/path" now fails closed before any
// disk I/O. See cmd/agent/keymem.go for the helper.
keyPath, kerr := safeAgentKeyPath(a.config.KeyDir, job.CertificateID)
if kerr != nil {
a.logger.Error("agent key path validation failed", "job_id", job.ID, "certificate_id", job.CertificateID, "error", kerr)
if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key path validation failed: %v", kerr)); reportErr != nil {
a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
}
return
}
if err := ensureAgentKeyDirSecure(filepath.Dir(keyPath)); err != nil {
a.logger.Error("agent key dir hardening failed", "job_id", job.ID, "error", err)
if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key dir hardening failed: %v", err)); reportErr != nil {