Files
shankar0123 fbe053aa0c refactor(mcp): split tools.go by tool domain — Option B sibling-files (Phase 9, 10 of N)
Phase 9 ARCH-M2 closure Sprint 10. Splits internal/mcp/tools.go
(was 1867 LOC, the second-largest backend hotspot after the
service/acme.go cuts in Sprints 9 + 9b) via the Option B sibling-
file pattern — new files stay in `package mcp` so every external
caller of `mcp.RegisterTools(...)` resolves the same way. Pure
mechanical relocation; no signature, no behavior, no import-graph
change.

Why this is naturally suited to Option B
========================================
The mcp package already follows the sibling-file convention:
tools_audit_fix.go (registerAuditFixTools), tools_auth.go
(registerAuthTools), tools_auth_bundle2.go (registerAuthBundle2Tools),
and tools_est.go (registerESTTools) each carry a single
register-function each, all in the same `mcp` package. Sprint 10
extends that pattern to the 22 register-functions still inside
tools.go.

The structure of tools.go is unusually clean for a refactor: every
domain has its own `// ── DomainName ──` banner above its
register-function, and every register-function ends with a `}` +
blank line before the next domain's banner. The RegisterTools
dispatcher stayed in tools.go and still invokes each
registerXxxTools(...) in the same order — calls cross a file
boundary but stay in `package mcp`, so same-package resolution
makes them zero-cost.

What moved
==========

New `internal/mcp/tools_certificates.go` (404 LOC) — certificate-
lifecycle domain:
  - registerCertificateTools (cert CRUD + revocation)
  - registerCRLOCSPTools
  - registerRenewalPolicyTools (Phase C P1-1..P1-5)
  - registerVerificationTools (Phase G P1-32/P1-34/P1-35)

New `internal/mcp/tools_agents.go` (266 LOC) — agent-management
domain:
  - registerAgentTools (per-agent CRUD + lifecycle)
  - registerAgentGroupTools

New `internal/mcp/tools_resources.go` (565 LOC) — resource-
management / configuration surface:
  - registerIssuerTools, registerTargetTools
  - registerPolicyTools, registerProfileTools
  - registerTeamTools, registerOwnerTools
  - registerNotificationTools
  - registerIntermediateCATools (Phase F P1-6..P1-9)

New `internal/mcp/tools_jobs.go` (170 LOC) — workflow domain:
  - registerJobTools
  - registerApprovalTools + approvalDecisionPayload struct
    (Phase A P1-28..P1-31)

New `internal/mcp/tools_discovery.go` (169 LOC) — discovery domain:
  - registerNetworkScanTools (Phase D P1-14..P1-19)
  - registerDiscoveryReadTools (Phase E P1-10..P1-13)

New `internal/mcp/tools_admin.go` (369 LOC) — observability / admin
domain:
  - registerAuditTools, registerStatsTools, registerDigestTools,
    registerMetricsTools, registerHealthTools
  - registerHealthCheckTools (Phase B P1-20..P1-27)

What stays in tools.go (109 LOC, down from 1867)
================================================
  - The RegisterTools dispatcher (still owns the canonical
    registration order; calls cross-file but stay in-package).
  - The three Bundle-3 wrappers + helper that every register
    function consumes: textResult (the json.RawMessage success-path
    fence), errorResult (the failure-path fence), paginationQuery
    (the URL helper).

The unused `context` import is dropped from tools.go as a clean
side effect — none of the four surviving functions take a
context.Context. Per-import audit on every new file:
  - tools_certificates.go: context, fmt, gomcp
  - tools_agents.go: context, fmt, net/url, gomcp
  - tools_resources.go: context, gomcp
  - tools_jobs.go: context, gomcp
  - tools_discovery.go: context, gomcp
  - tools_admin.go: context, net/url, strconv, gomcp
None of the moved code touched encoding/json directly — that import
stays inside tools.go for textResult's json.RawMessage param.

Bundle-3 fence guardrail update
===============================
The existing TestFenceGuardrail_NoBareCallToolResult guardrail in
fence_guardrail_test.go fails any file that constructs
gomcp.CallToolResult{...} literals outside the tools.go allowlist.
registerCRLOCSPTools — which moved to tools_certificates.go — has
two pre-existing literal CallToolResult constructions: each returns
a server-built status string of the form "DER CRL retrieved (%d
bytes, content-type: %s)" or "OCSP response retrieved (...)". The
byte count is `len(raw)` (server-controlled) and the content-type
comes from the HTTP header on the upstream PKI endpoint
(server-controlled in self-hosted deployments). Both predate
Bundle-3 fencing.

Two options to keep CI green:
  (a) Route through textResult — but that changes behavior (adds
      the UNTRUSTED MCP_RESPONSE fence around the response), which
      breaks the "mechanical relocation, no behavior change" rule
      Sprint 10 commits to.
  (b) Add tools_certificates.go to the allowlist with a comment
      explaining the carve-out is pre-existing and Sprint 10
      preserves byte-exact behavior.

This commit takes option (b). The allowlist comment in
fence_guardrail_test.go documents the carve-out, points at the
specific tools (CRL + OCSP binary-pass-through with server-built
status descriptions), and flags tightening these two sites through
textResult as a follow-up concern (open question: does the format
break MCP consumers that parse the description text).

Net effect
==========
tools.go: 1867 → 109 LOC (-1758 = -94.2%). Six new sibling files at
1943 LOC total (109 LOC of header + Phase 9 doc-comment overhead
per file = ~185 LOC of added documentation; the rest is moved
code). The biggest pre-Sprint-10 hotspot in the mcp package is now
smaller than tools_test.go (435 LOC).

Cumulative Phase 9 progress
===========================
  config.go        3403 → 1342 (-60.6%, Sprints 1-7)
  cmd/server/main.go 2966 → 2260 (-23.8%, Sprints 8 + 8b)
  service/acme.go  1965 → 1162 (-40.9%, Sprints 9 + 9b)
  mcp/tools.go     1867 →  109 (-94.2%, Sprint 10)
  TOTAL across 4 files: 10,201 → 4,873 LOC = -5,328 (-52.2%)

Behavior preservation contract
==============================
1. gofmt -l clean across all 8 affected files.
2. go vet ./internal/mcp/... — no findings.
3. staticcheck ./internal/mcp/... ./cmd/mcp-server/... — no findings.
4. go test -short -count=1 ./internal/mcp/... — green (includes the
   TestFenceGuardrail_NoBareCallToolResult guardrail post-allowlist-
   update, the tools_per_tool_test.go suite that exercises every
   moved register function, and the injection_regression_test.go
   suite that pins Bundle-3 fencing behavior on the wrapper layer).
5. Broader-importer build green: go build ./... .
6. Broader-importer tests green: go test -short ./cmd/mcp-server/...
   ./internal/api/handler/... ./cmd/server/... .

Same-package resolution means the RegisterTools dispatcher's
13-line call list in tools.go reaches each registerXxxTools across
six new sibling files via compile-time-resolved package-level
names; the public mcp.RegisterTools entry point + its (s, client)
signature is unchanged.

What remains for Phase 9
========================
Two sibling-file splits queued:
  - Sprint 11: internal/api/handler/auth_session_oidc.go (1577 LOC)
    split per handler verb (login / callback / refresh / logout /
    backchannel).
  - Sprint 12: cmd/agent/main.go (1489 LOC) mirroring the cmd/server
    pattern from Sprints 8 + 8b.

Refs: ARCH-M2 (god-files), Phase 9 audit. Sprint 10 closes the MCP
hotspot from the audit's top-6 list.
2026-05-14 10:15:21 +00:00

267 lines
11 KiB
Go

// Copyright 2026 certctl LLC. All rights reserved.
// SPDX-License-Identifier: BUSL-1.1
package mcp
import (
"context"
"fmt"
"net/url"
gomcp "github.com/modelcontextprotocol/go-sdk/mcp"
)
// Phase 9 ARCH-M2 closure Sprint 10 (2026-05-14): extracted from
// internal/mcp/tools.go via the Option B sibling-file pattern.
//
// This file groups the agent-management MCP tool domain: per-agent
// CRUD + lifecycle (registerAgentTools — register / list / get /
// retire / heartbeat / poll / claim / verify / discoveries) and the
// agent-group surface (registerAgentGroupTools — group CRUD +
// membership). Phase G P1-33 (POST /api/v1/agents/{id}/discoveries)
// stays intentionally absent from the MCP surface per the comment
// in tools.go::RegisterTools — that endpoint is the
// machine-to-machine path agents use to push filesystem-scan
// reports, not an operator-driven flow worth exposing to LLM
// consumers.
// ── Agents ──────────────────────────────────────────────────────────
func registerAgentTools(s *gomcp.Server, c *Client) {
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_list_agents",
Description: "List all registered agents with status, OS, architecture, and version info.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input ListParams) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agents", paginationQuery(input.Page, input.PerPage))
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_get_agent",
Description: "Get agent details including status, last heartbeat, OS, architecture, IP, and version.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input GetByIDInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agents/"+input.ID, nil)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_register_agent",
Description: "Register a new agent. Requires name and hostname. Returns 409 if an agent with the same name already exists.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input RegisterAgentInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Post("/api/v1/agents", input)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_agent_heartbeat",
Description: "Send agent heartbeat with optional metadata (OS, architecture, IP, version). Returns 404 if agent not found.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input struct {
ID string `json:"id" jsonschema:"Agent ID"`
Version string `json:"version,omitempty" jsonschema:"Agent version"`
Hostname string `json:"hostname,omitempty" jsonschema:"Hostname"`
OS string `json:"os,omitempty" jsonschema:"Operating system"`
Architecture string `json:"architecture,omitempty" jsonschema:"CPU architecture"`
IPAddress string `json:"ip_address,omitempty" jsonschema:"IP address"`
}) (*gomcp.CallToolResult, any, error) {
body := map[string]string{}
if input.Version != "" {
body["version"] = input.Version
}
if input.Hostname != "" {
body["hostname"] = input.Hostname
}
if input.OS != "" {
body["os"] = input.OS
}
if input.Architecture != "" {
body["architecture"] = input.Architecture
}
if input.IPAddress != "" {
body["ip_address"] = input.IPAddress
}
data, err := c.Post("/api/v1/agents/"+input.ID+"/heartbeat", body)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_agent_submit_csr",
Description: "Submit a PEM-encoded CSR from an agent for signing.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input AgentCSRInput) (*gomcp.CallToolResult, any, error) {
body := map[string]string{"csr_pem": input.CSRPEM}
if input.CertificateID != "" {
body["certificate_id"] = input.CertificateID
}
data, err := c.Post("/api/v1/agents/"+input.AgentID+"/csr", body)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_agent_pickup_certificate",
Description: "Agent picks up a signed certificate after CSR has been processed.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input AgentPickupInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agents/"+input.AgentID+"/certificates/"+input.CertID, nil)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_agent_get_work",
Description: "Get pending work items (deployment jobs, AwaitingCSR jobs) for an agent.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input GetByIDInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agents/"+input.ID+"/work", nil)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_agent_report_job_status",
Description: "Agent reports completion or failure of an assigned job.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input AgentJobStatusInput) (*gomcp.CallToolResult, any, error) {
body := map[string]string{"status": input.Status}
if input.Error != "" {
body["error"] = input.Error
}
data, err := c.Post("/api/v1/agents/"+input.AgentID+"/jobs/"+input.JobID+"/status", body)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
// I-004: soft-retirement. DELETE /api/v1/agents/{id} returns 200 on a
// fresh retire (body echoes retired_at/already_retired/cascade/counts),
// 204 on an idempotent retire of an already-retired agent (do() in
// client.go normalizes that to {"status":"deleted"}), 409 when downstream
// dependencies block the retire and force wasn't set, 403 on sentinel
// agents, or 400 when force=true was sent without a reason. The tool
// forwards the raw handler response so the LLM operator sees the
// dependency counts and can decide whether to retry with force=true.
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_retire_agent",
Description: "Soft-retire an agent (DELETE /api/v1/agents/{id}). Sets retired_at + retired_reason on the row; the agent is filtered from the default listing and surfaces only via certctl_list_retired_agents. Default is a safety-gated soft-retire that returns 409 blocked_by_dependencies if the agent has active targets, active certificates, or pending jobs — the returned counts tell you what would be orphaned. Pass force=true to cascade through and retire those dependents too; force=true requires a non-empty reason (captured in the audit trail). Sentinel discovery agents (server-scanner, cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm) cannot be retired — the handler returns 403 unconditionally. Idempotent: retrying on an already-retired agent returns 204 without side effects.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input RetireAgentInput) (*gomcp.CallToolResult, any, error) {
// Client-side mirror of the handler's ErrForceReasonRequired contract
// (see internal/api/handler/agents.go) so the LLM gets an immediate,
// actionable error instead of a round-trip 400. Whitespace-only
// reasons are treated as empty — matches handler's TrimSpace check.
if input.Force && input.Reason == "" {
return errorResult(fmt.Errorf("reason is required when force=true"))
}
query := url.Values{}
if input.Force {
query.Set("force", "true")
}
if input.Reason != "" {
query.Set("reason", input.Reason)
}
data, err := c.DeleteWithQuery("/api/v1/agents/"+input.ID, query)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
// I-004: retired agents are filtered out of GET /api/v1/agents by default.
// The /agents/retired endpoint is the opt-in view — same pagination shape
// as the default listing, but filters to rows where retired_at IS NOT NULL.
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_list_retired_agents",
Description: "List soft-retired agents (GET /api/v1/agents/retired). These are agents that have been retired via certctl_retire_agent; retired_at and retired_reason are populated. Returned separately from certctl_list_agents so the default listing stays focused on operational agents.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input ListParams) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agents/retired", paginationQuery(input.Page, input.PerPage))
if err != nil {
return errorResult(err)
}
return textResult(data)
})
}
// ── Agent Groups ────────────────────────────────────────────────────
func registerAgentGroupTools(s *gomcp.Server, c *Client) {
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_list_agent_groups",
Description: "List agent groups with dynamic matching criteria (OS, architecture, IP CIDR, version).",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input ListParams) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agent-groups", paginationQuery(input.Page, input.PerPage))
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_get_agent_group",
Description: "Get agent group details including matching criteria.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input GetByIDInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agent-groups/"+input.ID, nil)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_create_agent_group",
Description: "Create a new agent group with dynamic matching criteria. Requires name.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input CreateAgentGroupInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Post("/api/v1/agent-groups", input)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_update_agent_group",
Description: "Update an agent group's name, description, or matching criteria.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input UpdateAgentGroupInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Put("/api/v1/agent-groups/"+input.ID, input)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_delete_agent_group",
Description: "Delete an agent group.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input GetByIDInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Delete("/api/v1/agent-groups/" + input.ID)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_list_agent_group_members",
Description: "List agents that are members of a group (by dynamic criteria and manual membership).",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input GetByIDInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agent-groups/"+input.ID+"/members", nil)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
}