2 Commits

Author SHA1 Message Date
shankar0123 fbe053aa0c refactor(mcp): split tools.go by tool domain — Option B sibling-files (Phase 9, 10 of N)
Phase 9 ARCH-M2 closure Sprint 10. Splits internal/mcp/tools.go
(was 1867 LOC, the second-largest backend hotspot after the
service/acme.go cuts in Sprints 9 + 9b) via the Option B sibling-
file pattern — new files stay in `package mcp` so every external
caller of `mcp.RegisterTools(...)` resolves the same way. Pure
mechanical relocation; no signature, no behavior, no import-graph
change.

Why this is naturally suited to Option B
========================================
The mcp package already follows the sibling-file convention:
tools_audit_fix.go (registerAuditFixTools), tools_auth.go
(registerAuthTools), tools_auth_bundle2.go (registerAuthBundle2Tools),
and tools_est.go (registerESTTools) each carry a single
register-function each, all in the same `mcp` package. Sprint 10
extends that pattern to the 22 register-functions still inside
tools.go.

The structure of tools.go is unusually clean for a refactor: every
domain has its own `// ── DomainName ──` banner above its
register-function, and every register-function ends with a `}` +
blank line before the next domain's banner. The RegisterTools
dispatcher stayed in tools.go and still invokes each
registerXxxTools(...) in the same order — calls cross a file
boundary but stay in `package mcp`, so same-package resolution
makes them zero-cost.

What moved
==========

New `internal/mcp/tools_certificates.go` (404 LOC) — certificate-
lifecycle domain:
  - registerCertificateTools (cert CRUD + revocation)
  - registerCRLOCSPTools
  - registerRenewalPolicyTools (Phase C P1-1..P1-5)
  - registerVerificationTools (Phase G P1-32/P1-34/P1-35)

New `internal/mcp/tools_agents.go` (266 LOC) — agent-management
domain:
  - registerAgentTools (per-agent CRUD + lifecycle)
  - registerAgentGroupTools

New `internal/mcp/tools_resources.go` (565 LOC) — resource-
management / configuration surface:
  - registerIssuerTools, registerTargetTools
  - registerPolicyTools, registerProfileTools
  - registerTeamTools, registerOwnerTools
  - registerNotificationTools
  - registerIntermediateCATools (Phase F P1-6..P1-9)

New `internal/mcp/tools_jobs.go` (170 LOC) — workflow domain:
  - registerJobTools
  - registerApprovalTools + approvalDecisionPayload struct
    (Phase A P1-28..P1-31)

New `internal/mcp/tools_discovery.go` (169 LOC) — discovery domain:
  - registerNetworkScanTools (Phase D P1-14..P1-19)
  - registerDiscoveryReadTools (Phase E P1-10..P1-13)

New `internal/mcp/tools_admin.go` (369 LOC) — observability / admin
domain:
  - registerAuditTools, registerStatsTools, registerDigestTools,
    registerMetricsTools, registerHealthTools
  - registerHealthCheckTools (Phase B P1-20..P1-27)

What stays in tools.go (109 LOC, down from 1867)
================================================
  - The RegisterTools dispatcher (still owns the canonical
    registration order; calls cross-file but stay in-package).
  - The three Bundle-3 wrappers + helper that every register
    function consumes: textResult (the json.RawMessage success-path
    fence), errorResult (the failure-path fence), paginationQuery
    (the URL helper).

The unused `context` import is dropped from tools.go as a clean
side effect — none of the four surviving functions take a
context.Context. Per-import audit on every new file:
  - tools_certificates.go: context, fmt, gomcp
  - tools_agents.go: context, fmt, net/url, gomcp
  - tools_resources.go: context, gomcp
  - tools_jobs.go: context, gomcp
  - tools_discovery.go: context, gomcp
  - tools_admin.go: context, net/url, strconv, gomcp
None of the moved code touched encoding/json directly — that import
stays inside tools.go for textResult's json.RawMessage param.

Bundle-3 fence guardrail update
===============================
The existing TestFenceGuardrail_NoBareCallToolResult guardrail in
fence_guardrail_test.go fails any file that constructs
gomcp.CallToolResult{...} literals outside the tools.go allowlist.
registerCRLOCSPTools — which moved to tools_certificates.go — has
two pre-existing literal CallToolResult constructions: each returns
a server-built status string of the form "DER CRL retrieved (%d
bytes, content-type: %s)" or "OCSP response retrieved (...)". The
byte count is `len(raw)` (server-controlled) and the content-type
comes from the HTTP header on the upstream PKI endpoint
(server-controlled in self-hosted deployments). Both predate
Bundle-3 fencing.

Two options to keep CI green:
  (a) Route through textResult — but that changes behavior (adds
      the UNTRUSTED MCP_RESPONSE fence around the response), which
      breaks the "mechanical relocation, no behavior change" rule
      Sprint 10 commits to.
  (b) Add tools_certificates.go to the allowlist with a comment
      explaining the carve-out is pre-existing and Sprint 10
      preserves byte-exact behavior.

This commit takes option (b). The allowlist comment in
fence_guardrail_test.go documents the carve-out, points at the
specific tools (CRL + OCSP binary-pass-through with server-built
status descriptions), and flags tightening these two sites through
textResult as a follow-up concern (open question: does the format
break MCP consumers that parse the description text).

Net effect
==========
tools.go: 1867 → 109 LOC (-1758 = -94.2%). Six new sibling files at
1943 LOC total (109 LOC of header + Phase 9 doc-comment overhead
per file = ~185 LOC of added documentation; the rest is moved
code). The biggest pre-Sprint-10 hotspot in the mcp package is now
smaller than tools_test.go (435 LOC).

Cumulative Phase 9 progress
===========================
  config.go        3403 → 1342 (-60.6%, Sprints 1-7)
  cmd/server/main.go 2966 → 2260 (-23.8%, Sprints 8 + 8b)
  service/acme.go  1965 → 1162 (-40.9%, Sprints 9 + 9b)
  mcp/tools.go     1867 →  109 (-94.2%, Sprint 10)
  TOTAL across 4 files: 10,201 → 4,873 LOC = -5,328 (-52.2%)

Behavior preservation contract
==============================
1. gofmt -l clean across all 8 affected files.
2. go vet ./internal/mcp/... — no findings.
3. staticcheck ./internal/mcp/... ./cmd/mcp-server/... — no findings.
4. go test -short -count=1 ./internal/mcp/... — green (includes the
   TestFenceGuardrail_NoBareCallToolResult guardrail post-allowlist-
   update, the tools_per_tool_test.go suite that exercises every
   moved register function, and the injection_regression_test.go
   suite that pins Bundle-3 fencing behavior on the wrapper layer).
5. Broader-importer build green: go build ./... .
6. Broader-importer tests green: go test -short ./cmd/mcp-server/...
   ./internal/api/handler/... ./cmd/server/... .

Same-package resolution means the RegisterTools dispatcher's
13-line call list in tools.go reaches each registerXxxTools across
six new sibling files via compile-time-resolved package-level
names; the public mcp.RegisterTools entry point + its (s, client)
signature is unchanged.

What remains for Phase 9
========================
Two sibling-file splits queued:
  - Sprint 11: internal/api/handler/auth_session_oidc.go (1577 LOC)
    split per handler verb (login / callback / refresh / logout /
    backchannel).
  - Sprint 12: cmd/agent/main.go (1489 LOC) mirroring the cmd/server
    pattern from Sprints 8 + 8b.

Refs: ARCH-M2 (god-files), Phase 9 audit. Sprint 10 closes the MCP
hotspot from the audit's top-6 list.
2026-05-14 10:15:21 +00:00
shankar0123 23411bd6fc fix(bundle-3): MCP Trust-Boundary Fencing — 5 audit findings closed
Closes Audit-2026-04-25 H-002, H-003, M-003, M-004, M-005 (all CWE-1039
LLM Prompt Injection at the MCP↔consumer trust boundary, TB-7).

Strategy: wrapper-layer fencing. All 87 MCP tools route their success
path through textResult and their failure path through errorResult. By
fencing at those two wrappers we cover every existing tool AND every
future tool with a single change — no per-tool wiring required.

What changed
- internal/mcp/fence.go (new) — FenceUntrusted helper with strategy
  doc + per-finding rationale. Both fenceMCPResponse and fenceMCPError
  use it internally.
- internal/mcp/tools.go — textResult wraps response body via
  fenceMCPResponse; errorResult wraps error string via fenceMCPError.
- internal/mcp/tools_test.go — TestTextResult / TestErrorResult updated
  to assert fenced shape (start marker + end marker + inner body).
- internal/mcp/injection_regression_test.go (new) — 5 regression test
  functions, one per audit finding, each replays 5 classic LLM
  injection payloads (instruction_override, system_role_spoofing,
  delimiter_break_attempt, markdown_link_phishing, data_exfil_via_url)
  and asserts the planted payload appears VERBATIM (preservation,
  operator visibility) INSIDE the fence boundaries.
- internal/mcp/fence_guardrail_test.go (new) — CI guardrail that walks
  every non-test .go file in the mcp package and fails if it finds a
  bare gomcp.CallToolResult literal outside tools.go. Prevents future
  tools from silently bypassing the fence.

Delimiter-forgery defense
The naive constant fence (--- UNTRUSTED MCP_RESPONSE END ---) is
forgeable: an attacker who controls a field value can plant the literal
end marker and "break out" of the fence. Defense: every fence call
generates a 6-byte crypto/rand nonce, hex-encoded, and embeds it in
BOTH the START and END markers. An attacker would need to predict the
nonce (2^48 search per fence) to forge a matching END inside the
payload. The delimiter_break_attempt regression test exercises this.

Per-finding mapping
- H-002 Cert Subject DN injection (CSR submitter controlled) →
  TestMCP_PromptInjection_H002_CertSubjectDN
- H-003 Discovered cert metadata injection (cert owner controlled) →
  TestMCP_PromptInjection_H003_DiscoveredCertMetadata
- M-003 Agent heartbeat injection (agent self-reports hostname/OS/IP)
  → TestMCP_PromptInjection_M003_AgentHeartbeat
- M-004 Upstream CA error injection (CA controls error string) →
  TestMCP_PromptInjection_M004_UpstreamCAError
- M-005 Audit details + notification body injection (downstream actors
  control these) → TestMCP_PromptInjection_M005_AuditDetailsAndNotifications

Verification gates
- go vet ./...                                 → clean
- go build ./...                               → clean
- go test -short -count=1 ./...                → all packages pass
- go test -count=1 ./internal/mcp/...          → all packages pass
- npx tsc --noEmit (web)                       → clean
- npx vitest run (web)                         → 337 passed
- python3 yaml.safe_load(api/openapi.yaml)     → 89 paths, 56 schemas

Threat-model placement: TB-7 (MCP↔LLM consumer). certctl owns the
boundary; consumer-side prompt engineering is recommended but not
relied upon. Defense-in-depth: per-call nonce closes the
delimiter-forgery edge case that constant fences would have left
exposed.

Bundle 3 of the 2026-04-25 comprehensive audit (88 findings).
2026-04-25 22:44:33 +00:00