Files
certctl/internal/domain/connector_test.go
T
shankar0123 87213128cc fix(security,domain): redact Agent.APIKeyHash from JSON wire shape (G-2)
Pre-G-2 internal/domain/connector.go::Agent::APIKeyHash was tagged
`json:"api_key_hash"` and shipped on every wire surface that returned
domain.Agent — GET /api/v1/agents (PagedResponse{Data: agents}),
GET /api/v1/agents/{id}, GET /api/v1/agents/retired, and the
POST /api/v1/agents registration response. Every authenticated client
(browser, CLI --json, MCP tool calls) received the SHA-256-of-the-API-key
string. The browser silently dropped it because web/src/api/types.ts
omits the field, but CLI and MCP consumers print full JSON so the hash
was visible there. Even though the value is a hash and not the plaintext
key, shipping it gives an attacker an offline brute-force target if the
API-key entropy is low (certctl doesn't enforce a minimum on operator-
supplied keys), and there's no business reason for any client to ever
receive it — the value is server-internal, used only for the lookup at
internal/repository/postgres/agent.go::GetByAPIKey. (Audit:
cat-s5-apikey_leak in coverage-gap-audit-2026-04-24-v5/unified-audit.md.)

We chose the audit's recommended fix (json:"-") plus a defense-in-depth
MarshalJSON plus a CI guardrail. Three layers because struct-tag
redaction alone is one rebase away from being silently reverted, the
custom MarshalJSON catches the case where a parent struct embeds Agent
under a different tag, and the CI grep blocks reintroduction at the spec
or frontend boundary even without a code review catching it.

Files changed:

Phase 1 — Domain redaction:
- internal/domain/connector.go: APIKeyHash tag flipped from
  `json:"api_key_hash"` to `json:"-"`. New Agent.MarshalJSON
  with value receiver + type-alias-recursion-break that explicitly
  zeroes APIKeyHash on the marshal-time copy. Long-form docblock
  explaining the G-2 closure rationale + cross-references to
  service.RegisterAgent (populator), repository.AgentRepository::
  GetByAPIKey (consumer), docs/architecture.md (DB-shape vs
  API-shape distinction), and the audit finding.

Phase 2 — Domain tests (5 test functions):
- internal/domain/connector_test.go: TestAgent_MarshalJSON_RedactsAPIKeyHash
  pins the marshal-boundary contract on a value receiver. ...RedactsViaPointer
  pins the *Agent path. ...RedactsInSlice pins the []Agent path that the
  ListAgents handler actually emits via PagedResponse. ...DoesNotMutateReceiver
  pins the by-value-receiver contract so a future refactor that switches
  to pointer-receiver gets caught. ...RoundTrip pins the wire-shape
  guarantee that APIKeyHash is dropped on encode and cannot reappear on
  decode. Single sentinel value ("sha256:LEAKED-CREDENTIAL-DERIVATIVE-
  SENTINEL") flows through every fixture for grep-ability on regression.

Phase 3 — Handler tests (4 test functions):
- internal/api/handler/agent_handler_test.go: TestListAgents_DoesNotLeakAPIKeyHash,
  TestGetAgent_DoesNotLeakAPIKeyHash, TestRegisterAgent_DoesNotLeakAPIKeyHash,
  TestListRetiredAgents_DoesNotLeakAPIKeyHash. Each asserts (a) the
  literal substring "api_key_hash" is absent from the httptest-captured
  body, (b) the leak sentinel value is absent, (c) the non-leaked fields
  ARE present (sanity that the handler is serving real data, not just
  empty payloads). Shared sentinel "sha256:LEAKED-CREDENTIAL-DERIVATIVE-
  HANDLER-SENTINEL" so a single grep over a failing test's output
  identifies the leak surface immediately.

Phase 4 — Spec / docs:
- api/openapi.yaml: api_key_hash property REMOVED from Agent schema
  (was at line 3690). Inline G-2 comment naming the closure + the
  database-vs-API-shape distinction so a future spec edit doesn't
  silently re-introduce the field.
- docs/architecture.md: ER-diagram block already documents the agents
  table including api_key_hash (DB shape — correct). Added a sibling
  note paragraph immediately below the diagram explaining that several
  columns are intentionally server-internal (api_key_hash redaction
  + issuers.config / deployment_targets.config encrypted shadow), with
  cross-references to the redaction enforcement site, the OpenAPI
  schema, the frontend interface, and the CI guardrail.
- web/src/api/types.ts: Agent interface unchanged in shape (already
  omitted the field) but added a leading comment block explaining
  WHY the omission is intentional — stops a future frontend dev from
  "completing" the interface from the OpenAPI spec or the Go struct.

Phase 5 — CI guardrail:
- .github/workflows/ci.yml: new "Forbidden api_key_hash JSON-shape
  regression guard (G-2)" step. Scoped patterns catch the actual
  regression shapes — Go struct tag (json:"api_key_hash"), frontend
  interface declaration, OpenAPI schema property, YAML enum/array
  membership. Repository / migration / seed / service / integration /
  unit-test / comment lines exempt. Verified locally on the real tree
  (passes) and against 4 synthetic regression patterns (each fires
  the guardrail). Mirrors the G-1 pattern from .github/workflows/
  ci.yml lines 47-108.

Phase 5b — Sweep verification (no changes, results documented for the
next reader):
- internal/api/middleware/audit.go: doesn't serialize Agent struct;
  records request body only. No leak.
- service.RegisterAgent audit-event payload: `map[string]interface{}{
  "name": name, "hostname": hostname}` — name + hostname only,
  no APIKeyHash. No leak.
- All 9 slog sites that mention agent: scalar attrs only ("agent_id",
  "error", "agent_hostname"), never the full struct. No leak.
- internal/mcp, internal/cli, cmd/cli, cmd/mcp-server: zero matches
  for APIKeyHash / api_key_hash. Both pass server JSON verbatim, so
  the wire-side fix transitively closes them.

Verification (all gates pass):
- go build ./...
- go vet ./...
- go test -short ./... — every package green
- go test -short -race ./internal/domain/... ./internal/api/handler/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template smoke render — succeeds
- python3 yaml.safe_load on api/openapi.yaml — parses
- OpenAPI Agent schema scan: no api_key_hash property
- CI guardrail mirror: clean on real tree, fires on all 4 synthetic
  regression patterns
- Domain pkg coverage: Agent.MarshalJSON 100%, connector.go total 87.5%
- Handler pkg coverage: 79.2%

Sample response body (httptest captured during verification, GET
/api/v1/agents/{id} via the new handler test):

  {"id":"agent-demo","name":"demo-agent","hostname":"demo.host",
  "status":"Online","last_heartbeat_at":"2026-04-24T11:59:30Z",
  "registered_at":"2026-04-24T12:00:00Z","os":"linux",
  "architecture":"amd64","ip_address":"10.0.0.42",
  "version":"v2.0.49"}

Note the absence of any api_key_hash key, even though the in-memory
struct passed to the handler had APIKeyHash set to a sentinel.

Out of scope (intentionally untouched):
- internal/repository/postgres/agent.go SELECT/INSERT/UPDATE/scan
  paths and GetByAPIKey lookup — DB column stays, repo still
  populates the struct, auth lookup still works. The redaction is a
  marshal-boundary concern.
- migrations/000001_initial_schema.up.sql + migrations/seed_*.sql —
  DB schema and seed data unchanged.
- internal/service/agent.go::RegisterAgent — service-side hashing
  and persistence unchanged.
- Other domain types with potential credential-derivative fields
  (Issuer.Config, DeploymentTarget.Config, notifier configs). Not
  flagged by the audit; some are already protected (e.g.,
  DeploymentTarget.EncryptedConfig []byte `json:"-"`). File a
  separate audit pass if recon surfaces additional leaks.
- Per-resource DTO layer across every handler. Single audit
  finding, single domain type.
- A separate possible follow-up: the v2 RegisterAgent endpoint
  doesn't return the plaintext API key to the agent, which may
  mean self-bootstrap via POST /api/v1/agents is broken. Verified
  during recon; out of scope for G-2; should be its own ticket.

Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
      §2 P1 cluster, cat-s5-apikey_leak
      Audit recommendation: 'json:"-" or API-response DTO
      excluding APIKeyHash' — went with the json:"-" + MarshalJSON
      defense-in-depth pair plus CI guardrail and structural docs.
2026-04-25 01:56:26 +00:00

237 lines
9.0 KiB
Go

package domain
import (
"encoding/json"
"strings"
"testing"
"time"
)
// jsonMarshalDirect / jsonUnmarshalDirect are thin aliases for the
// stdlib encoding/json calls. They exist only to make the G-2 redaction
// tests below grep-friendly: any future test author searching for "how
// is APIKeyHash redaction tested" will land on these and the call sites,
// rather than having to grep through dozens of unrelated json.Marshal
// usages.
func jsonMarshalDirect(v interface{}) ([]byte, error) { return json.Marshal(v) }
func jsonUnmarshalDirect(data []byte, v interface{}) error { return json.Unmarshal(data, v) }
func containsSubstr(haystack, needle string) bool { return strings.Contains(haystack, needle) }
// TestAgent_IsRetired covers the I-004 soft-retirement predicate that gates
// which callers hide an agent row from active listings.
func TestAgent_IsRetired(t *testing.T) {
t.Run("nil receiver is not retired", func(t *testing.T) {
var a *Agent
if a.IsRetired() {
t.Fatalf("nil *Agent should not be retired")
}
})
t.Run("zero value is not retired", func(t *testing.T) {
a := &Agent{}
if a.IsRetired() {
t.Fatalf("zero Agent should not be retired")
}
})
t.Run("RetiredAt set is retired", func(t *testing.T) {
now := time.Now()
a := &Agent{RetiredAt: &now}
if !a.IsRetired() {
t.Fatalf("Agent with RetiredAt != nil must be retired")
}
})
}
// TestAgentDependencyCounts_HasDependencies verifies the preflight
// aggregation helper used by the 409 block path of DELETE /agents/{id}.
func TestAgentDependencyCounts_HasDependencies(t *testing.T) {
cases := []struct {
name string
counts AgentDependencyCounts
want bool
}{
{"all zero", AgentDependencyCounts{}, false},
{"active target", AgentDependencyCounts{ActiveTargets: 1}, true},
{"active cert", AgentDependencyCounts{ActiveCertificates: 1}, true},
{"pending job", AgentDependencyCounts{PendingJobs: 1}, true},
{"mixed", AgentDependencyCounts{ActiveTargets: 3, PendingJobs: 2}, true},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if got := tc.counts.HasDependencies(); got != tc.want {
t.Fatalf("HasDependencies()=%v want=%v counts=%+v", got, tc.want, tc.counts)
}
})
}
}
// G-2 (P1): the cat-s5-apikey_leak audit closure tests. Pre-G-2,
// Agent.APIKeyHash was tagged `json:"api_key_hash"` and shipped on
// every /api/v1/agents response — credential-derivative leak that gave
// offline brute-force targets to every authenticated client. Post-G-2
// the tag is "-" AND Agent.MarshalJSON zeroes the field on a marshal-
// time copy. These tests pin both layers of the defense:
//
// 1. A populated APIKeyHash is never present in the marshaled JSON.
// 2. The redaction holds on *Agent, on slice elements, and on a
// sentinel literal-value check (so even a future field that
// happens to contain the same hash string would not appear).
// 3. The marshal-time copy does not mutate the caller's original —
// receiver is by-value, but pin it explicitly so a future refactor
// that switches to pointer-receiver gets caught.
// 4. Round-trip preserves every other field (hash dropped on encode,
// cannot reappear on decode because the wire never carries it).
const g2LeakSentinel = "sha256:LEAKED-CREDENTIAL-DERIVATIVE-SENTINEL"
// TestAgent_MarshalJSON_RedactsAPIKeyHash is the marshal-boundary
// contract test: a single Agent value with a populated APIKeyHash must
// not emit the field name nor the sentinel value.
func TestAgent_MarshalJSON_RedactsAPIKeyHash(t *testing.T) {
t.Parallel()
a := Agent{
ID: "agent-test",
Name: "test-agent",
Hostname: "host.example",
Status: AgentStatusOnline,
RegisteredAt: time.Date(2026, 1, 1, 0, 0, 0, 0, time.UTC),
APIKeyHash: g2LeakSentinel,
OS: "linux",
Architecture: "amd64",
IPAddress: "10.0.0.1",
Version: "v2.0.49",
}
out, err := jsonMarshalDirect(a)
if err != nil {
t.Fatalf("Marshal returned error: %v", err)
}
body := string(out)
if containsSubstr(body, "api_key_hash") {
t.Errorf("marshaled body contains \"api_key_hash\" key — G-2 leak regressed:\n%s", body)
}
if containsSubstr(body, "APIKeyHash") {
t.Errorf("marshaled body contains \"APIKeyHash\" — type-alias redaction broke:\n%s", body)
}
if containsSubstr(body, g2LeakSentinel) {
t.Errorf("marshaled body contains the leak sentinel %q — value redaction broke:\n%s", g2LeakSentinel, body)
}
// Sanity: every OTHER non-zero field IS present (this guards against
// the type-alias trick accidentally dropping siblings).
for _, want := range []string{"agent-test", "test-agent", "host.example", "Online", "linux", "amd64", "10.0.0.1", "v2.0.49"} {
if !containsSubstr(body, want) {
t.Errorf("marshaled body missing expected field value %q:\n%s", want, body)
}
}
}
// TestAgent_MarshalJSON_RedactsViaPointer covers the *Agent path that
// handlers hit when calling JSON(w, http.StatusOK, agent) with a *Agent
// from svc.GetAgent. A value-receiver MarshalJSON is reachable from
// pointer values via reflect; this test pins that contract.
func TestAgent_MarshalJSON_RedactsViaPointer(t *testing.T) {
t.Parallel()
a := &Agent{ID: "agent-x", APIKeyHash: g2LeakSentinel}
out, err := jsonMarshalDirect(a)
if err != nil {
t.Fatalf("Marshal *Agent returned error: %v", err)
}
if containsSubstr(string(out), g2LeakSentinel) {
t.Errorf("*Agent marshal leaked sentinel:\n%s", string(out))
}
if containsSubstr(string(out), "api_key_hash") {
t.Errorf("*Agent marshal contains \"api_key_hash\" key:\n%s", string(out))
}
}
// TestAgent_MarshalJSON_RedactsInSlice covers the []domain.Agent path
// the ListAgents handler emits via PagedResponse{Data: agents}. Each
// element must be redacted independently.
func TestAgent_MarshalJSON_RedactsInSlice(t *testing.T) {
t.Parallel()
agents := []Agent{
{ID: "agent-1", APIKeyHash: g2LeakSentinel + "-1"},
{ID: "agent-2", APIKeyHash: g2LeakSentinel + "-2"},
{ID: "agent-3", APIKeyHash: g2LeakSentinel + "-3"},
}
out, err := jsonMarshalDirect(agents)
if err != nil {
t.Fatalf("Marshal []Agent returned error: %v", err)
}
body := string(out)
if containsSubstr(body, "api_key_hash") {
t.Errorf("[]Agent marshal contains \"api_key_hash\" key:\n%s", body)
}
for i := 1; i <= 3; i++ {
sentinel := g2LeakSentinel + "-" + string(rune('0'+i))
if containsSubstr(body, sentinel) {
t.Errorf("[]Agent marshal leaked sentinel %q:\n%s", sentinel, body)
}
}
// Every agent ID is present — the redaction didn't accidentally
// strip the entire element.
for _, id := range []string{"agent-1", "agent-2", "agent-3"} {
if !containsSubstr(body, id) {
t.Errorf("[]Agent marshal missing element ID %q:\n%s", id, body)
}
}
}
// TestAgent_MarshalJSON_DoesNotMutateReceiver pins the by-value-receiver
// contract: marshaling must not zero APIKeyHash on the caller's struct,
// only on the marshal-time copy. This guards against a future refactor
// that switches to pointer receiver and breaks every code path that
// marshals an Agent and then re-uses it (e.g., audit-event payload
// construction immediately after returning the agent in a handler).
func TestAgent_MarshalJSON_DoesNotMutateReceiver(t *testing.T) {
t.Parallel()
a := Agent{ID: "agent-keep", APIKeyHash: g2LeakSentinel}
if _, err := jsonMarshalDirect(a); err != nil {
t.Fatalf("Marshal returned error: %v", err)
}
if a.APIKeyHash != g2LeakSentinel {
t.Errorf("MarshalJSON mutated caller's APIKeyHash: got %q want %q", a.APIKeyHash, g2LeakSentinel)
}
}
// TestAgent_MarshalJSON_RoundTrip pins the wire-shape contract: an
// Agent marshaled to JSON and unmarshaled back into a fresh Agent has
// every field preserved EXCEPT APIKeyHash, which the wire never carries.
// This double-confirms the redaction is a one-way guarantee at the
// serialization boundary, not an accidental on-decode behavior.
func TestAgent_MarshalJSON_RoundTrip(t *testing.T) {
t.Parallel()
now := time.Date(2026, 1, 1, 12, 0, 0, 0, time.UTC)
hb := now.Add(-5 * time.Minute)
original := Agent{
ID: "agent-rt",
Name: "rt",
Hostname: "rt.host",
Status: AgentStatusOnline,
LastHeartbeatAt: &hb,
RegisteredAt: now,
APIKeyHash: g2LeakSentinel,
OS: "linux",
Architecture: "arm64",
IPAddress: "10.0.0.99",
Version: "v2.0.49",
}
out, err := jsonMarshalDirect(original)
if err != nil {
t.Fatalf("Marshal returned error: %v", err)
}
var got Agent
if err := jsonUnmarshalDirect(out, &got); err != nil {
t.Fatalf("Unmarshal returned error: %v", err)
}
if got.APIKeyHash != "" {
t.Errorf("APIKeyHash survived round-trip: got %q want empty (the wire must not carry it)", got.APIKeyHash)
}
if got.ID != original.ID || got.Name != original.Name || got.Hostname != original.Hostname {
t.Errorf("identity fields lost in round-trip: got %+v want %+v", got, original)
}
if got.OS != original.OS || got.Architecture != original.Architecture || got.IPAddress != original.IPAddress {
t.Errorf("metadata fields lost in round-trip: got %+v want %+v", got, original)
}
}