mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 17:22:07 +00:00
7d2e7043b93c4633f81704126ed4446da8a2ee2d
26 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
fbe053aa0c |
refactor(mcp): split tools.go by tool domain — Option B sibling-files (Phase 9, 10 of N)
Phase 9 ARCH-M2 closure Sprint 10. Splits internal/mcp/tools.go
(was 1867 LOC, the second-largest backend hotspot after the
service/acme.go cuts in Sprints 9 + 9b) via the Option B sibling-
file pattern — new files stay in `package mcp` so every external
caller of `mcp.RegisterTools(...)` resolves the same way. Pure
mechanical relocation; no signature, no behavior, no import-graph
change.
Why this is naturally suited to Option B
========================================
The mcp package already follows the sibling-file convention:
tools_audit_fix.go (registerAuditFixTools), tools_auth.go
(registerAuthTools), tools_auth_bundle2.go (registerAuthBundle2Tools),
and tools_est.go (registerESTTools) each carry a single
register-function each, all in the same `mcp` package. Sprint 10
extends that pattern to the 22 register-functions still inside
tools.go.
The structure of tools.go is unusually clean for a refactor: every
domain has its own `// ── DomainName ──` banner above its
register-function, and every register-function ends with a `}` +
blank line before the next domain's banner. The RegisterTools
dispatcher stayed in tools.go and still invokes each
registerXxxTools(...) in the same order — calls cross a file
boundary but stay in `package mcp`, so same-package resolution
makes them zero-cost.
What moved
==========
New `internal/mcp/tools_certificates.go` (404 LOC) — certificate-
lifecycle domain:
- registerCertificateTools (cert CRUD + revocation)
- registerCRLOCSPTools
- registerRenewalPolicyTools (Phase C P1-1..P1-5)
- registerVerificationTools (Phase G P1-32/P1-34/P1-35)
New `internal/mcp/tools_agents.go` (266 LOC) — agent-management
domain:
- registerAgentTools (per-agent CRUD + lifecycle)
- registerAgentGroupTools
New `internal/mcp/tools_resources.go` (565 LOC) — resource-
management / configuration surface:
- registerIssuerTools, registerTargetTools
- registerPolicyTools, registerProfileTools
- registerTeamTools, registerOwnerTools
- registerNotificationTools
- registerIntermediateCATools (Phase F P1-6..P1-9)
New `internal/mcp/tools_jobs.go` (170 LOC) — workflow domain:
- registerJobTools
- registerApprovalTools + approvalDecisionPayload struct
(Phase A P1-28..P1-31)
New `internal/mcp/tools_discovery.go` (169 LOC) — discovery domain:
- registerNetworkScanTools (Phase D P1-14..P1-19)
- registerDiscoveryReadTools (Phase E P1-10..P1-13)
New `internal/mcp/tools_admin.go` (369 LOC) — observability / admin
domain:
- registerAuditTools, registerStatsTools, registerDigestTools,
registerMetricsTools, registerHealthTools
- registerHealthCheckTools (Phase B P1-20..P1-27)
What stays in tools.go (109 LOC, down from 1867)
================================================
- The RegisterTools dispatcher (still owns the canonical
registration order; calls cross-file but stay in-package).
- The three Bundle-3 wrappers + helper that every register
function consumes: textResult (the json.RawMessage success-path
fence), errorResult (the failure-path fence), paginationQuery
(the URL helper).
The unused `context` import is dropped from tools.go as a clean
side effect — none of the four surviving functions take a
context.Context. Per-import audit on every new file:
- tools_certificates.go: context, fmt, gomcp
- tools_agents.go: context, fmt, net/url, gomcp
- tools_resources.go: context, gomcp
- tools_jobs.go: context, gomcp
- tools_discovery.go: context, gomcp
- tools_admin.go: context, net/url, strconv, gomcp
None of the moved code touched encoding/json directly — that import
stays inside tools.go for textResult's json.RawMessage param.
Bundle-3 fence guardrail update
===============================
The existing TestFenceGuardrail_NoBareCallToolResult guardrail in
fence_guardrail_test.go fails any file that constructs
gomcp.CallToolResult{...} literals outside the tools.go allowlist.
registerCRLOCSPTools — which moved to tools_certificates.go — has
two pre-existing literal CallToolResult constructions: each returns
a server-built status string of the form "DER CRL retrieved (%d
bytes, content-type: %s)" or "OCSP response retrieved (...)". The
byte count is `len(raw)` (server-controlled) and the content-type
comes from the HTTP header on the upstream PKI endpoint
(server-controlled in self-hosted deployments). Both predate
Bundle-3 fencing.
Two options to keep CI green:
(a) Route through textResult — but that changes behavior (adds
the UNTRUSTED MCP_RESPONSE fence around the response), which
breaks the "mechanical relocation, no behavior change" rule
Sprint 10 commits to.
(b) Add tools_certificates.go to the allowlist with a comment
explaining the carve-out is pre-existing and Sprint 10
preserves byte-exact behavior.
This commit takes option (b). The allowlist comment in
fence_guardrail_test.go documents the carve-out, points at the
specific tools (CRL + OCSP binary-pass-through with server-built
status descriptions), and flags tightening these two sites through
textResult as a follow-up concern (open question: does the format
break MCP consumers that parse the description text).
Net effect
==========
tools.go: 1867 → 109 LOC (-1758 = -94.2%). Six new sibling files at
1943 LOC total (109 LOC of header + Phase 9 doc-comment overhead
per file = ~185 LOC of added documentation; the rest is moved
code). The biggest pre-Sprint-10 hotspot in the mcp package is now
smaller than tools_test.go (435 LOC).
Cumulative Phase 9 progress
===========================
config.go 3403 → 1342 (-60.6%, Sprints 1-7)
cmd/server/main.go 2966 → 2260 (-23.8%, Sprints 8 + 8b)
service/acme.go 1965 → 1162 (-40.9%, Sprints 9 + 9b)
mcp/tools.go 1867 → 109 (-94.2%, Sprint 10)
TOTAL across 4 files: 10,201 → 4,873 LOC = -5,328 (-52.2%)
Behavior preservation contract
==============================
1. gofmt -l clean across all 8 affected files.
2. go vet ./internal/mcp/... — no findings.
3. staticcheck ./internal/mcp/... ./cmd/mcp-server/... — no findings.
4. go test -short -count=1 ./internal/mcp/... — green (includes the
TestFenceGuardrail_NoBareCallToolResult guardrail post-allowlist-
update, the tools_per_tool_test.go suite that exercises every
moved register function, and the injection_regression_test.go
suite that pins Bundle-3 fencing behavior on the wrapper layer).
5. Broader-importer build green: go build ./... .
6. Broader-importer tests green: go test -short ./cmd/mcp-server/...
./internal/api/handler/... ./cmd/server/... .
Same-package resolution means the RegisterTools dispatcher's
13-line call list in tools.go reaches each registerXxxTools across
six new sibling files via compile-time-resolved package-level
names; the public mcp.RegisterTools entry point + its (s, client)
signature is unchanged.
What remains for Phase 9
========================
Two sibling-file splits queued:
- Sprint 11: internal/api/handler/auth_session_oidc.go (1577 LOC)
split per handler verb (login / callback / refresh / logout /
backchannel).
- Sprint 12: cmd/agent/main.go (1489 LOC) mirroring the cmd/server
pattern from Sprints 8 + 8b.
Refs: ARCH-M2 (god-files), Phase 9 audit. Sprint 10 closes the MCP
hotspot from the audit's top-6 list.
|
||
|
|
21aeed4f4e |
legal: addlicense headers + normalize legacy variants (Phase 0 RED-4)
Phase 0 closure (Path B2, post-rewrite):
addlicense sweep — adds the canonical certctl LLC copyright + BUSL-1.1
SPDX header to every production Go file. Template:
// Copyright 2026 certctl LLC. All rights reserved.
// SPDX-License-Identifier: BUSL-1.1
Coverage: 338 / 338 production Go files (cmd/ + internal/, excluding
*_test.go and **/testdata/**). Pre-sweep coverage was 22 / 338 (6.5%);
post-sweep is 338 / 338 (100%).
Normalized 22 pre-existing legacy headers (`// Copyright (c) certctl`
+ `// SPDX-License-Identifier: BSL-1.1`) and 1 file using a
`Certctl Contributors` attribution. The legacy SPDX ID `BSL-1.1`
is non-standard; the official SPDX identifier for Business Source
License 1.1 is `BUSL-1.1` (capital U). All 338 files now share the
canonical form.
Generated via:
addlicense -c "certctl LLC" -y 2026 \
-f cowork/legal/copyright-header.tpl \
-ignore '**/testdata/**' -ignore '**/*_test.go' \
cmd/ internal/
Verification:
find cmd internal -name '*.go' -not -name '*_test.go' \
-not -path '*/testdata/*' \
-exec grep -L '^// Copyright 2026 certctl LLC' {} \; | wc -l
Returns: 0
gofmt clean. Header additions are comments only, no compile impact.
Closes: cowork/certctl-architecture-diligence-audit.html#fix-RED-4
|
||
|
|
0152bdf567 |
fix(auth/rbac): scope-aware ActorRole revoke (A-4)
HIGH-10's UNIQUE (actor, role, scope_type, scope_id, tenant) uniqueness
extension lets an operator grant the same role to the same actor at
multiple scopes (e.g. r-operator on profile=p-acme AND profile=p-globex).
But ActorRoleRepository.Revoke's WHERE clause omitted (scope_type,
scope_id) — a single call deleted every variant. Selective revoke was
unrepresentable; operators had to drop all and re-grant N-1, opening
a race window where the actor's access was briefly different.
Closure across all layers (handler → service → repo → MCP → GUI client),
preserving the legacy "revoke all variants" contract for unmodified
callers:
internal/repository/auth.go
- New ActorRoleRevokeOptions struct. Zero value = legacy semantic;
non-empty ScopeType narrows to one variant.
- New ErrActorRoleNotFound sentinel for scoped no-match (HTTP 404).
internal/repository/postgres/auth.go
- Revoke signature extended with opts. Empty opts.ScopeType uses
the legacy SQL (no scope WHERE), zero-row delete = no error.
- Non-empty narrows with `scope_type = $5 AND scope_id IS NOT
DISTINCT FROM $6` — the IS-NOT-DISTINCT-FROM is load-bearing,
vanilla `=` would silently miss the (global, NULL) case because
NULL ≠ NULL in standard SQL.
- Selective revoke with zero matching rows returns
ErrActorRoleNotFound; operators get feedback on typos.
internal/service/auth/actor_role_service.go
- Revoke takes opts. Audit row's details map records the scope so
SIEMs can distinguish wide-vs-selective revokes:
`scope: "all_variants"` for the legacy path, or
`scope_type` + `scope_id` for selective. Privilege check
(auth.role.assign) and reserved-actor guard unchanged.
internal/api/handler/auth.go
- RevokeRoleFromKey parses optional `?scope_type=` / `?scope_id=`
query params via new parseRevokeScope helper.
- Validation mirrors AssignRoleToKey: scope_id forbidden with
scope_type=global, required with profile/issuer, invalid
scope_type → 400. scope_id without scope_type also → 400.
- writeAuthError maps ErrActorRoleNotFound to 404.
internal/mcp/tools_auth.go + types.go
- AuthRevokeKeyRoleInput gains optional ScopeType + ScopeID with
jsonschema descriptions explaining the dual-mode contract.
- Tool call site appends URL-encoded query params when ScopeType
is set; legacy callers (no scope_type) emit the bare DELETE
path unchanged.
web/src/api/client.ts
- authRevokeKeyRole signature: optional 3rd argument
`{ scope_type?, scope_id? }`. Pre-A-4 call sites (no opts arg)
keep firing the bare DELETE — fully backward compatible. The
GUI KeysPage's per-row revoke button (still one row per role,
pre-Fix-12) continues to use the legacy shape; future GUI work
can pass scope params for per-variant rows.
docs/operator/rbac.md
- New "Revoke: legacy 'all variants' vs scope-selective" subsection
under "From the HTTP API" with curl examples for both modes plus
the audit-row payload shape that lets SOC/SIEM tell them apart.
Regression coverage:
Repository (testcontainers, skipped under -short — 6 tests in
internal/repository/postgres/auth_revoke_scope_test.go):
TestRevokeActorRole_NoOpts_RemovesAllVariants
TestRevokeActorRole_WithScope_RemovesOnlyMatching
TestRevokeActorRole_WithGlobalScope_RemovesOnlyGlobal — pins the
IS-NOT-DISTINCT-FROM branch (global, NULL)
TestRevokeActorRole_NoMatch_ReturnsNotFound — pins the new sentinel
TestRevokeActorRole_NoOpts_NoMatch_IsNoOp — pins the legacy
idempotence contract
TestRevokeActorRole_IssuerScope_RemovesOnlyMatching — pin the
issuer-scope half (profile + issuer are symmetric scope types)
Handler (7 new tests in auth_test.go):
TestAuthHandler_RevokeRoleFromKey — extended to assert no scope
filter is forwarded when query string is empty (legacy behaviour)
TestAuthHandler_RevokeRoleFromKey_A4_ScopedProfile
TestAuthHandler_RevokeRoleFromKey_A4_ScopedGlobal
TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithGlobal
TestAuthHandler_RevokeRoleFromKey_A4_RejectsMissingScopeID
TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithoutScopeType
TestAuthHandler_RevokeRoleFromKey_A4_RejectsInvalidScopeType
TestAuthHandler_RevokeRoleFromKey_A4_ScopedNotFoundReturns404
MCP (2 new table rows in tools_per_tool_test.go):
Scoped revoke with scope_type=profile + scope_id=p-acme →
`?scope_type=profile&scope_id=p-acme`
Scoped revoke with scope_type=global (no scope_id) →
`?scope_type=global`
Service-layer test plumbing (service_test.go) updated for new opts
arg: 4 existing call sites pass repository.ActorRoleRevokeOptions{}
to keep their pre-A-4 semantics; the fakeActorRoleRepo.Revoke
implementation now mirrors the postgres scope-aware behaviour
(legacy zero-value vs scoped narrowing + ErrActorRoleNotFound on
no-match).
Verify gate green: gofmt clean, go vet clean, go test -short across
repository/postgres, service/auth, api/handler, and mcp. The
pre-existing KeysPage.test.tsx failure observed on the baseline
commit (reproduced via `git stash` earlier in Fix 03) is unrelated;
my client.ts change adds an optional third argument and is fully
backward-compatible.
Spec at cowork/auth-bundles-fixes-2026-05-11/04-high-actor-role-revoke-scope.md.
Audit doc updated: new row A-4 (2026-05-11) CLOSED appended to the
status table at the bottom of cowork/auth-bundles-audit-2026-05-10.md.
Operator-visible advisory in CHANGELOG.md v2.1.0 release notes under
Security (non-BREAKING — legacy callers are unchanged).
Depends on Fix 01 (the scope-aware EffectivePermissions read path on
branch fix/audit-2026-05-11/crit-actor-role-scope-reads). This fix
makes the inverse op selectively reversible; without Fix 01 the read
side would mis-evaluate scoped grants anyway, making selective revoke
moot at runtime.
|
||
|
|
ca31232ad2 |
feat(mcp): 11 audit-fix MCP tools — approvals, break-glass, bootstrap, audit-category (MED-13)
Audit 2026-05-10 MED-13 closure.
WHAT.
11 new MCP tools rounding out the operator surface for workflows
that previously had GUI + CLI coverage but no MCP equivalent:
Approval workflow (4):
certctl_approval_list GET /v1/approvals approval.read
certctl_approval_get GET /v1/approvals/{id} approval.read
certctl_approval_approve POST /v1/approvals/{id}/approve approval.approve
certctl_approval_reject POST /v1/approvals/{id}/reject approval.reject
Break-glass credential admin (4):
certctl_breakglass_list GET /v1/auth/breakglass/credentials
certctl_breakglass_set_password POST /v1/auth/breakglass/credentials
certctl_breakglass_unlock POST /v1/auth/breakglass/credentials/{actor_id}/unlock
certctl_breakglass_remove DELETE /v1/auth/breakglass/credentials/{actor_id}
All gated auth.breakglass.admin; surface invisible (404 not 403)
when CERTCTL_BREAKGLASS_ENABLED=false.
Bootstrap (2):
certctl_bootstrap_status GET /v1/auth/bootstrap (auth-exempt; safe probe)
certctl_bootstrap_consume POST /v1/auth/bootstrap (auth-exempt; one-shot mint)
Audit category filter (1):
certctl_audit_list_with_category GET /v1/audit?category=<cat> audit.read
WHY.
certctl_bootstrap_consume is the load-bearing day-0 primitive: a
fresh server with no admin actors lets the holder of CERTCTL_BOOTSTRAP_TOKEN
mint a fresh admin API key. Exposing it via MCP without a security
gate would let a downstream caller mint admin from any chat
transcript / log surface that captured the bootstrap token. The
tool description carries an explicit cautious-wording comment:
CAUTION: NEVER WIRE THIS TO AUTONOMOUS OPERATION. A leaked
bootstrap token from any log, telemetry, or chat-transcript
surface lets a downstream caller mint a fresh admin API key
bypassing every other access-control gate. Run this manually,
exactly once, from a trusted shell.
Similarly certctl_breakglass_set_password's description flags
that the password crosses the MCP transport in plaintext; the
server-side handler hashes with Argon2id before persisting + the
audit row redacts, but client-side logging must NEVER capture the
payload.
HOW.
internal/mcp/tools_audit_fix.go (NEW):
registerAuditFixTools(s, c) — declares the 11 tools via
gomcp.AddTool. Each tool routes through the existing Client.Get/
Post/Delete helpers; the server-side rbacGate wrappers (or
auth-exempt allowlist, for bootstrap) handle authorization.
internal/mcp/types.go:
Adds 5 input structs:
ApprovalIDInput (get/approve/reject)
BreakglassActorIDInput (unlock/remove)
BreakglassSetPasswordInput (set_password — flagged plaintext)
BootstrapConsumeInput (token + key_name; cautious comment)
AuditListWithCategoryInput (category + optional limit/since/until/actor_id)
Each tagged with jsonschema descriptions for LLM tool discovery.
internal/mcp/tools.go:
RegisterTools now calls registerAuditFixTools after the existing
Bundle 2 Phase 9 registrar.
internal/mcp/tools_per_tool_test.go:
allHappyPathCases extended with 11 new entries. The existing
TestMCP_AllTools_HappyPath dispatches each tool via the in-memory
MCP transport against a 2xx mock backend and asserts the
wrapper-layer fence wraps the response; TestMCP_AllTools_ErrorPath
dispatches against a 5xx mock and asserts MCP_ERROR fence.
TestMCP_RegisterTools_DispatchableToolCount confirms every new
tool is dispatchable by name.
VERIFY.
- go vet ./internal/mcp/... PASS
- go test -short -count=1
-run 'TestMCP_AllTools_HappyPath|TestMCP_AllTools_ErrorPath|
TestMCP_RegisterTools_DispatchableToolCount'
./internal/mcp/... PASS
- go test -short -count=1 ./internal/mcp/... PASS (0.3s)
Refs: cowork/auth-bundles-audit-2026-05-10.md MED-13
cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md item 4
|
||
|
|
b09bd0984a |
auth-bundle-2 Phase 9: 11 OIDC + session MCP tools (Phase-5 surface parity)
Closes Phase 9 of cowork/auth-bundle-2-prompt.md. Every Phase-5 HTTP
endpoint now has a matching MCP tool so operators driving certctl
from Claude / VS Code / any MCP client get the same OIDC-provider +
group-mapping + session management capability the GUI + CLI already
expose.
Coverage map (each tool → HTTP endpoint → permission)
=====================================================
certctl_auth_list_oidc_providers GET /v1/auth/oidc/providers auth.oidc.list
certctl_auth_get_oidc_provider GET /v1/auth/oidc/providers (filtered) auth.oidc.list
certctl_auth_create_oidc_provider POST /v1/auth/oidc/providers auth.oidc.create
certctl_auth_update_oidc_provider PUT /v1/auth/oidc/providers/{id} auth.oidc.edit
certctl_auth_delete_oidc_provider DELETE /v1/auth/oidc/providers/{id} auth.oidc.delete
certctl_auth_refresh_oidc_provider POST /v1/auth/oidc/providers/{id}/refresh auth.oidc.edit
certctl_auth_list_group_mappings GET /v1/auth/oidc/group-mappings?provider_id auth.oidc.list
certctl_auth_add_group_mapping POST /v1/auth/oidc/group-mappings auth.oidc.edit
certctl_auth_remove_group_mapping DELETE /v1/auth/oidc/group-mappings/{id} auth.oidc.edit
certctl_auth_list_sessions GET /v1/auth/sessions[?actor_id=&actor_type=] auth.session.list (own) | auth.session.list.all (other)
certctl_auth_revoke_session DELETE /v1/auth/sessions/{id} auth.session.revoke (or own-bypass)
Implementation notes
====================
internal/mcp/tools_auth_bundle2.go (NEW): 11 tools wired through three
focused register functions (registerAuthOIDCProviderTools,
registerAuthGroupMappingTools, registerAuthSessionTools). Every tool
routes through the existing Client (Get/Post/Put/Delete) so permission
gates fire server-side via the Phase-5 rbacGate wrappers — a non-admin
caller's MCP tool invocation gets whatever 403 the underlying HTTP
handler emits, not an MCP-side bypass.
Empty-id guard
--------------
Every path-id tool short-circuits to errorResult(fmt.Errorf("id is required"))
BEFORE the HTTP call. Defense against url.PathEscape("") collapsing a
singular op into the list endpoint (which would silently succeed against
a permissive backend). Same pattern across all 6 path-id tools (get,
update, delete, refresh provider; remove mapping; revoke session).
auth_get_oidc_provider list-then-filter
---------------------------------------
The Phase-5 HTTP API doesn't expose a singular GET /v1/auth/oidc/providers/{id}
endpoint — the GUI's OIDCProviderDetailPage fetches the full list and
filters in-process. The MCP tool mirrors that pattern exactly: GET the
list, JSON-decode the providers envelope, walk the array filtering by
id, return the matching raw JSON object on hit or an explicit "oidc
provider not found: <id>" error on miss. This keeps the MCP surface
in lockstep with the GUI's permission boundary (auth.oidc.list grants
"see any provider", as it does on the GUI) without inventing a new HTTP
endpoint.
internal/mcp/types.go (MODIFIED): 8 new input types matching the
Phase-5 wire shapes (oidcProviderRequest at internal/api/handler/auth_session_oidc.go).
client_secret on Update is optional — empty preserves the existing
ciphertext on the server, providing a value rotates. Mirrors the GUI's
edit-without-rotate UX from web/src/pages/auth/OIDCProviderDetailPage.tsx.
internal/mcp/tools.go (MODIFIED): registerAuthBundle2Tools wired into
RegisterTools alongside the Bundle 1 Phase 11 registerAuthTools.
Test coverage
=============
internal/mcp/tools_auth_bundle2_test.go (NEW), 5 test cases:
* TestAuthBundle2MCP_AllToolsRegister — registerAuthBundle2Tools
doesn't panic; catches duplicate-name regressions before CI.
* TestAuthBundle2MCP_PathsAndMethods — 11 cases (one per tool) +
the admin-other-actor variant of list_sessions; asserts the right
method + path + body + query string fires against the mock API.
* TestAuthBundle2MCP_ForbiddenSurfacesError — every tool's underlying
HTTP path returns a propagated error containing "forbidden" / "403"
when the mock returns 403, exercising the errorResult fence path.
* TestAuthBundle2MCP_GetProviderFiltersListByID — pins the list-then-
filter shape end-to-end with both the hit-and-return (returns the
matching raw JSON object) and miss-returns-error (sentinel string
"oidc provider not found") branches.
* TestAuthBundle2MCP_EmptyIDInputShortCircuits — pins the
strings.TrimSpace empty-id guard at the top of every path-id handler.
* TestAuthBundle2MCP_PromptCoverage — every tool the prompt enumerates
is also present in tools_per_tool_test.go's allHappyPathCases (so
the live-dispatch + 5xx error-path tests cover all 11 tools).
internal/mcp/tools_per_tool_test.go (MODIFIED): 11 new toolCase entries
in allHappyPathCases (live in-memory MCP dispatch + happy-path fence
shape + 5xx error-path fence shape) + a mock-API special case for
GET /api/v1/auth/oidc/providers that returns the right envelope shape
({"providers":[{"id":"op-okta",...}]}) so the get_oidc_provider tool's
in-process filter resolves under the live dispatch.
Verification
============
* gofmt + go vet — clean across internal/mcp/...
* go test -short -count=1 — green across internal/mcp + internal/auth/...
+ internal/api/handler + internal/api/router (13 packages, 0 failures).
* MCP tool count re-derive (CLAUDE.md command):
grep -cE 'mcp\.AddTool\(' internal/mcp/tools*.go
→ tools.go=121, tools_auth.go=12, tools_auth_bundle2.go=11 (new),
tools_est.go=6 — total 150. Matches the live count
TestMCP_RegisterTools_DispatchableToolCount asserts.
* staticcheck deferred — sandbox /tmp at 99% disk, can't install the
binary; all SA*/ST* lints would have run via the staticcheck-CI step
on push. go vet caught the only real issue (an unused context import)
before commit.
Not in this commit (deferred)
=============================
* Break-glass admin MCP tools (4 endpoints from Phase 7.5). The Phase 9
prompt does NOT enumerate break-glass tools; its exit criteria is
"Every API endpoint from Phase 5 has an MCP tool". Phase 5 does not
include the break-glass surface (Phase 7.5 ships those endpoints with
surface-invisibility semantics: 404 when CERTCTL_BREAKGLASS_ENABLED=false,
which complicates LLM tool-discovery UX). If the operator wants
break-glass MCP parity, that's a follow-on bundle.
|
||
|
|
cbb47aaf5d |
auth-bundle-1 Phase 11 + 12: RBAC MCP tools + negative-test coverage gate
# Phase 11 — RBAC MCP tools
12 new tools in internal/mcp/tools_auth.go mirroring the Phase-4
+ Phase-7 HTTP surface so operators driving certctl from Claude
/ VS Code / any MCP client get the same management capability
the GUI + CLI already expose:
certctl_auth_me GET /v1/auth/me
certctl_auth_list_roles GET /v1/auth/roles
certctl_auth_get_role GET /v1/auth/roles/{id}
certctl_auth_create_role POST /v1/auth/roles
certctl_auth_update_role PUT /v1/auth/roles/{id}
certctl_auth_delete_role DELETE /v1/auth/roles/{id}
certctl_auth_list_permissions GET /v1/auth/permissions
certctl_auth_add_permission_to_role POST /v1/auth/roles/{id}/permissions
certctl_auth_remove_permission_from_role DELETE /v1/auth/roles/{id}/permissions/{perm}
certctl_auth_list_keys GET /v1/auth/keys
certctl_auth_assign_role_to_key POST /v1/auth/keys/{id}/roles
certctl_auth_revoke_role_from_key DELETE /v1/auth/keys/{id}/roles/{role_id}
Each tool routes through the existing HTTP client (no parallel
business logic), so permission gates fire server-side: a
non-admin caller's MCP tool invocation returns whatever 403 the
underlying HTTP handler emits, fenced via errorResult for LLM-
prompt-injection defense.
Input types in internal/mcp/types.go (AuthRoleIDInput,
AuthCreateRoleInput, AuthUpdateRoleInput,
AuthRolePermissionGrantInput, AuthRolePermissionRevokeInput,
AuthAssignKeyRoleInput, AuthRevokeKeyRoleInput) carry
jsonschema descriptions so the MCP consumer's tool catalogue
shows operator-friendly hints.
internal/mcp/tools_auth_test.go ships 14 tests:
- TestAuthMCP_AllToolsRegister (registration must not panic)
- TestAuthMCP_PathsAndMethods (table-driven, 12 rows pinning
each tool's HTTP method + URL)
- TestAuthMCP_ForbiddenSurfacesFencedError (12 tools × 403
mock → error surface)
internal/mcp/tools_per_tool_test.go's allHappyPathCases extended
with the 12 new rows so the in-memory dispatch coverage gate
(TestMCP_RegisterTools_DispatchableToolCount) stays green at the
new total of 139 registered tools.
Re-derived total via 'grep -cE "gomcp\.AddTool\(" internal/mcp/tools*.go':
133 (121 in tools.go + 12 in tools_auth.go).
# Phase 12 — negative-test coverage gate
Audit of the prompt's 12 negative-test paths against existing
coverage:
1. Missing actor → 401 ✓ TestRequirePermission_NoActorReturns401, TestRBACGate_NoActorReturns401
2. No roles → 403 ✓ TestRequirePermission_DeniedActorReturns403, TestRBACGate_AuditorRole_403sOnAdminRoutes
3. Role lacks specific perm → 403 ✓ same suite
4. Wrong scope → 403 ✓ TestAuthorizer_SpecificScopeMatchesExactID (wrongID arm)
5. Self-grant w/o auth.role.assign → 403 ✓ TestActorRoleService_GrantRequiresAuthRoleAssign
6. Bootstrap token wrong → 401 ✓ TestEnvTokenStrategy_WrongTokenReturnsInvalidToken, TestBootstrapHandler_Mint_WrongToken_401
7. Bootstrap used twice → 410 ✓ TestEnvTokenStrategy_OneShotConsumption, TestBootstrapHandler_Mint_TwiceReturns410
8. Bootstrap when admin exists → 410 ✓ TestEnvTokenStrategy_AdminExistsClosesPath, TestBootstrapHandler_Mint_AdminExists410
9. Role delete with assignees → 409 NEW: TestRoleService_DeleteWithActorsAssignedReturns409
10. Profile-edit loophole → gated ✓ TestProfileEdit_RequiresApprovalLoopholeClosed
11. Permission not in catalog → 400 ✓ TestRoleService_AddPermissionRejectsNonCanonical
12. Scope ID for nonexistent resource → 404 (validation deferred — no FK constraint between role_permissions.scope_id and the resource tables; documented for a future bundle)
Filled the gap at #9 with TestRoleService_DeleteWithActorsAssignedReturns409
which pins the repository sentinel pass-through (postgres FK
ON DELETE RESTRICT → repository.ErrAuthRoleInUse → service
returns the sentinel verbatim → handler maps to HTTP 409).
# Coverage gates
.github/coverage-thresholds.yml gains 2 entries:
- internal/auth: floor 85
- internal/service/auth: floor 85
.github/workflows/ci.yml's coverage test command extended with
./internal/auth/... and ./internal/api/router/... so the
threshold check has data to evaluate.
# Protocol-endpoint not-gated test (Category F)
internal/api/router/phase12_protocol_allowlist_test.go (new)
adds 3 router-level invariant tests:
- TestPhase12_ProtocolEndpointsNotGated: AST-walks router.go,
asserts no rbacGate(...) call references a path under any
protocol-endpoint prefix (/acme, /scep, /.well-known/est,
/.well-known/pki/ocsp, /.well-known/pki/crl).
- TestPhase12_IsProtocolEndpoint_CoversCanonicalPrefixes:
pins auth.IsProtocolEndpoint against the canonical prefix
set; if a future protocol lands without lockstep allowlist
update, this fails.
- TestPhase12_RBACGateRoutesAreUnderAPIv1: belt-and-braces —
every rbacGate-wrapped route MUST start with /api/v1/.
Catches accidental cross-prefix wraps.
Complements the existing TestRequirePermission_ProtocolEndpointBypassesGate
(middleware-level) + TestRouter_AuthExemptAllowlist_PinsActualRegistrations
(allowlist drift) so the Category F invariant is pinned at all
three layers (middleware + router + dispatch).
# Verifications
* gofmt clean repo-wide.
* go vet ./... clean.
* staticcheck across internal/auth + handler + router + cli +
service + repository + cmd + domain + mcp: clean.
* go test -short -count=1 green across internal/auth (incl.
bootstrap), internal/api/handler, internal/api/router,
internal/cli, internal/service (incl. auth),
internal/domain/auth, internal/mcp, cmd/server, cmd/cli.
|
||
|
|
99a012e3be |
auth-bundle-1 Phase 0: extract internal/auth/ from middleware package
Bundle 1 / Phase 0: pure refactor splitting auth surface out of internal/api/middleware so Bundle 2 (OIDC + sessions) and the broader RBAC primitive (roles, permissions, scoped grants) have a clean home. Moved to internal/auth/: NamedAPIKey, HashAPIKey, AuthConfig, NewAuthWithNamedKeys, NewAuth, UserKey, AdminKey, GetUser, IsAdmin. Added testfixtures.go (WithActor / WithAdmin / WithActorAdmin) so handler tests don't construct context manually. Stayed in internal/api/middleware/: RequestID, Logging, NewLogging, Recovery, RateLimitConfig, NewRateLimiter (now imports auth.GetUser for per-user keying per audit Category C), CORSConfig, NewCORS, ContentType, CORS, GetRequestID, responseWriter, Chain, audit middleware (now imports auth.GetUser). Updated 22 caller files across cmd/, internal/api/handler/, internal/api/middleware/, internal/mcp/. Existing m008_admin_gate_test.go now scans for auth.IsAdmin( substring; Phase 3 will further evolve to track auth.RequirePermission. Behavior unchanged: all handler / middleware / service / connector / cmd / mcp tests pass with no test-logic edits, only import-path renames. Phase 0 exit criteria: internal/auth/ exists with 6 files; middleware.go went 575 -> 422 lines (auth-related ~150 lines moved out); grep -rE 'middleware\.(GetUser|IsAdmin|UserKey|AdminKey|NamedAPIKey|HashAPIKey|NewAuth)' returns 0 hits; context.WithValue(.*middleware.UserKey/AdminKey) returns 0 hits; go vet ./... clean; go test -short ./... green across all packages tested. Branch: dev/auth-bundle-1. Per cowork/auth-bundle-1-prompt.md, do not merge to master without (1) make verify green, (2) >= 2 external testers confirm, (3) >= 90% coverage on internal/auth/ in .github/coverage-thresholds.yml. |
||
|
|
ff75361553 |
mcp(coverage): add 34 tools across 7 domains to close 2026-05-05 parity audit P1 findings
Closes findings P1-1..P1-35 from the 2026-05-05 CLI/API/MCP↔GUI parity
audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Before this
bundle, 35 operator-facing API endpoints had GUI surfaces but no MCP
counterpart — operators using AI assistants for cert lifecycle work in
regulated environments had to drop to curl for approve/reject, health-check
acknowledgement, renewal-policy CRUD, network-scan triggering, discovery
triage, intermediate-CA management, and job verification.
Tool count: 87→121 in tools.go (+34), 6 unchanged in tools_est.go.
Re-derive via grep -cE 'gomcp\\.AddTool\\(' internal/mcp/tools.go
internal/mcp/tools_est.go.
The 7 phases (matching the bundle prompt at
cowork/mcp-coverage-expansion-prompt.md):
Phase A — Approvals (P1-28..P1-31, 4 tools)
list_approvals, get_approval, approve_request, reject_request.
Two-person-integrity contract (ErrApproveBySameActor → HTTP 403)
is preserved automatically: the decided_by actor is derived
server-side from middleware.UserKey, NOT from request body, so
the MCP server's authenticated API-key identity becomes the
audit-trail actor. The MCP input schema deliberately omits any
actor_id field to prevent client-side spoofing.
Phase B — Health Checks (P1-20..P1-27, 8 tools)
list, summary, get, create, update, delete, history, acknowledge.
Mirrors the existing target-resource shape; acknowledge takes
optional 'actor' string captured in the audit row (handler defaults
to 'unknown' if absent).
Phase C — Renewal Policies (P1-1..P1-5, 5 tools)
Standard CRUD against /api/v1/renewal-policies. Distinct from the
legacy 'policy' tools that point at the same path — these expose
the renewal-policy domain explicitly with full alert_channels +
alert_severity_map field shape.
Phase D — Network Scan Targets (P1-14..P1-19, 6 tools)
CRUD + trigger_scan. trigger_network_scan returns the discovery-
scan body so the AI can chain into list_discovered_certificates
filtered by agent_id.
Phase E — Discovery read-side (P1-10..P1-13, 4 tools)
list_discovered_certificates, get_discovered_certificate,
list_discovery_scans, discovery_summary. Complements the
pre-existing claim/dismiss tools (registered alongside Health
historically per the I-2 closure).
Phase F — Intermediate CAs (P1-6..P1-9, 4 tools)
list, create (root + child via discriminator on body shape), get,
retire. The handler is admin-gated via middleware.IsAdmin; the
least-privilege boundary is enforced at the API layer (HTTP 403
for non-admin Bearer callers) — not by transport carve-out.
Phase G — Verification + deployments (P1-32, P1-34, P1-35, 3 tools)
list_certificate_deployments, verify_job, get_job_verification.
P1-33 (POST /api/v1/agents/{id}/discoveries) is intentionally
excluded — machine-to-machine push channel for agents reporting
filesystem-scan results, not an operator-driven flow. Documented
inline in the RegisterTools dispatch.
Implementation:
- 14 new input types in internal/mcp/types.go with jsonschema struct
tags driving LLM tool discovery.
- 7 register* functions in internal/mcp/tools.go each handling one
phase, wired into RegisterTools dispatch in declaration order.
- 34 new entries in tools_per_tool_test.go::allHappyPathCases —
the existing in-process MCP harness (TestMCP_AllTools_HappyPath +
TestMCP_AllTools_ErrorPath + TestMCP_RegisterTools_DispatchableToolCount)
auto-extends coverage to cover every new tool: happy-path round-
trip with fence-shape assertion, 5xx error-path with MCP_ERROR fence
propagation, and 'every registered tool is dispatchable' guard.
- docs/reference/mcp.md 'Available Tools' table expanded from 16 to
22 resource domains with current per-domain tool counts.
Acceptance gate (verified):
- go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/... ./cmd/mcp-server/...
clean across all four production binaries.
- go vet ./... clean.
- go test -short -count=1 ./internal/mcp/... pass (TestMCP_AllTools_*
expanded to 127 tool round-trips).
- go test -short -count=1 ./... pass repo-wide.
- bash scripts/ci-guards/openapi-handler-parity.sh clean (router 178,
OpenAPI 144, exceptions 36 — unchanged; we add MCP wrappers, not
routes).
- gofmt -l clean across the four touched files.
|
||
|
|
75097909e9 | |||
|
|
36885da2da |
EST RFC 7030 hardening master bundle Phases 8-9: GUI ESTAdminPage
(Profiles + Recent Activity + Trust Bundle tabs) + CLI subcommand
family `certctl-cli est {cacerts,csrattrs,enroll,reenroll,
serverkeygen,test}` + 6 MCP tools.
Phase 8 — ESTAdminPage tabbed GUI:
- web/src/pages/ESTAdminPage.tsx mirrors SCEPAdminPage's three-tab
surface. Profiles tab renders per-profile cards with auth-mode
badges (mTLS / Basic / ServerKeygen), mTLS trust-anchor expiry
countdown (good ≥30d / warn 7-30d / bad <7d / EXPIRED), 12-cell
counter grid (success_simpleenroll/.../internal_error), and the
admin-gated "Reload trust anchor" action. Recent Activity tab
merges the four EST audit actions (est_simple_enroll +
est_simple_reenroll + est_server_keygen + est_auth_failed) across
four parallel useQuery calls with chip filters for All/Enrollment/
Re-enrollment/ServerKeygen/AuthFailure. Trust Bundle tab renders
per-mTLS-profile cert subjects + expiries.
- M-009 useTrackedMutation guard: every mutation routes through
the tracked hook so audit/progress hooks fire.
- Page-level admin gate renders "Admin access required" banner for
non-admin callers + skips underlying API requests so the server
never sees a 403-prone request. Server-side enforcement is the
M-008 admin gate; this is a UX hint.
- Wired into web/src/main.tsx at /est; nav link added to Layout.tsx.
- New web/src/api/types.ts types ESTStatsSnapshot +
ESTTrustAnchorInfo + ESTProfilesResponse + ESTReloadTrustResponse
mirror service.ESTStatsSnapshot 1:1.
- New web/src/api/client.ts helpers getAdminESTProfiles +
reloadAdminESTTrust.
- 14 Vitest cases (admin gate non-admin / non-auth-required deploy /
default tab / tab switch / deep-link tab / per-profile card render
+ counter cells / reload-button mTLS-only / trust-expiry badge
band / reload modal Confirm-Cancel-Error paths / Trust Bundle
empty-state / Activity filter chip toggle).
Phase 9.1 — CLI subcommands:
- internal/cli/est.go adds 6 subcommands: cacerts / csrattrs /
enroll / reenroll / serverkeygen / test. CSR input via --csr
with file-path or '-' for stdin; multipart serverkeygen response
is parsed by stdlib mime/multipart and split into <prefix>.cert.pem
+ <prefix>.key.enveloped so the operator can decrypt the key with
openssl smime. EST `test` smoke-tests cacerts + csrattrs + emits
one-line OK/FAIL diagnostics.
- cmd/cli/main.go grows the `est` dispatch + Usage entries.
Phase 9.2 — MCP tools:
- internal/mcp/tools_est.go adds 6 tools mapped to the EST endpoints
+ admin observability: est_list_profiles + est_admin_stats (alias)
+ est_get_cacerts + est_get_csrattrs + est_enroll + est_reenroll.
Tool count grew from 87 → 93 (verified via the registered-vs-
covered guard in tools_per_tool_test.go); the per-tool happy/error-
path table grew with 6 matching entries so the future-tool-no-test
CI guard stays green.
- internal/mcp/client.go grows PostRaw — non-JSON POST helper that
the EST enroll/reenroll tools use to ship raw application/pkcs10
CSR bytes through the MCP fence-wrapped response.
- estRawResultJSON wraps the raw response body in a JSON envelope
the MCP consumer can structurally consume (content_type +
body_base64 + body_size_bytes). Mirrors the CRL/OCSP MCP tools'
binary-DER envelope.
Phase 9.3 — Tests:
- internal/cli/est_test.go: 8 cases pinning the wire-shape contract
on the CLI side without dragging the full ESTHandler into the
test build.
- internal/mcp/tools_est_test.go: path-builder + JSON-envelope unit
tests + end-to-end tool exercise that pins all 5 captured request
paths through a fake API.
Pre-commit verification (sandbox): gofmt clean, go vet clean
(excluding repository/postgres which the sandbox can't build —
pre-existing testcontainers limit), staticcheck clean across
cli/mcp/cmd/cli, go test -short -count=1 green for every non-
postgres Go package, Vitest green for ESTAdminPage (14) +
SCEPAdminPage (20) — 34 page tests total. G-3 docs-drift guard
reproduced locally clean (Phases 8-9 added zero new env vars).
Spec preserved at cowork/est-rfc7030-hardening-prompt.md. Phases
10-13 (libest sidecar e2e / bulk revocation + audit codes /
docs/est.md / release prep + tag) remain — post-2.1.0 work.
|
||
|
|
52b86a08f4 |
Bundle K (Coverage Audit Closure): MCP per-tool coverage — C-002 closed
internal/mcp line coverage 28.0% -> 93.1% (+65.1pp; +8.1 above target)
via internal/mcp/tools_per_tool_test.go (~580 LoC, 4 top-level + 174 sub-tests).
Strategy: gomcp.NewInMemoryTransports() wires an in-process client +
server pair; RegisterTools(server, client) is invoked against a mock
certctl API; every one of 87 registered tools is dispatched via
clientSession.CallTool. This is the first test in the package that
exercises the closure bodies inside register*Tools — existing tests
(tools_test.go, injection_regression_test.go, fence_guardrail_test.go,
retire_agent_test.go) tested the wrapper + HTTP client in isolation.
Tests:
TestMCP_AllTools_HappyPath: 87 sub-tests, mock 'ok' mode,
asserts response fence end-to-end.
TestMCP_AllTools_ErrorPath: 87 sub-tests, mock '5xx' mode,
asserts MCP_ERROR fence.
TestMCP_FenceInjectionResistance: 50 dispatches; asserts per-call
nonce uniqueness (security property).
TestMCP_FenceWithPlantedEndMarker: planted attacker nonce does not
collide with real RNG nonce.
TestMCP_RegisterTools_DispatchableToolCount: tool-inventory check
(87 registered == 87 covered).
Per-register*Tools coverage:
registerCertificateTools: 11.2% -> 84.1%
registerCRLOCSPTools: 20.0% -> 100.0%
registerIssuerTools: 20.0% -> 100.0%
registerTargetTools: 20.0% -> 100.0%
registerAgentTools: 13.5% -> 86.5%
registerJobTools: 15.2% -> 90.9%
registerPolicyTools: 19.4% -> 100.0%
registerProfileTools: 20.0% -> 100.0%
registerTeamTools: 20.0% -> 100.0%
registerOwnerTools: 20.0% -> 100.0%
registerAgentGroupTools: 20.0% -> 100.0%
registerAuditTools: 20.0% -> 100.0%
registerNotificationTools: 17.4% -> 95.7%
registerStatsTools: 14.7% -> 91.2%
registerDigestTools: 20.0% -> 100.0%
registerMetricsTools: 20.0% -> 100.0%
registerHealthTools: 19.4% -> 100.0%
Binary-blob tools (certctl_get_der_crl, certctl_ocsp_check) bypass
textResult by design — they return human-readable summaries instead
of fenced JSON. Matches the existing fence_guardrail_test.go allowlist.
Verification:
go vet ./internal/mcp/... clean
gofmt -l internal/mcp/ clean
staticcheck -checks all clean (only pre-existing S1009 +
ST1000 hits in master remain)
go test -short -cover 93.1% coverage
go test -race -count=1 PASS, 0 races
Audit deliverables:
findings.yaml: C-002 status open -> closed
gap-backlog.md: closure log + C-002 strikethrough
coverage-matrix.md: MCP row at 93.1%
closure-plan.md: Bundle K [x] closed
CHANGELOG.md: [unreleased] Bundle K entry
|
||
|
|
23411bd6fc |
fix(bundle-3): MCP Trust-Boundary Fencing — 5 audit findings closed
Closes Audit-2026-04-25 H-002, H-003, M-003, M-004, M-005 (all CWE-1039 LLM Prompt Injection at the MCP↔consumer trust boundary, TB-7). Strategy: wrapper-layer fencing. All 87 MCP tools route their success path through textResult and their failure path through errorResult. By fencing at those two wrappers we cover every existing tool AND every future tool with a single change — no per-tool wiring required. What changed - internal/mcp/fence.go (new) — FenceUntrusted helper with strategy doc + per-finding rationale. Both fenceMCPResponse and fenceMCPError use it internally. - internal/mcp/tools.go — textResult wraps response body via fenceMCPResponse; errorResult wraps error string via fenceMCPError. - internal/mcp/tools_test.go — TestTextResult / TestErrorResult updated to assert fenced shape (start marker + end marker + inner body). - internal/mcp/injection_regression_test.go (new) — 5 regression test functions, one per audit finding, each replays 5 classic LLM injection payloads (instruction_override, system_role_spoofing, delimiter_break_attempt, markdown_link_phishing, data_exfil_via_url) and asserts the planted payload appears VERBATIM (preservation, operator visibility) INSIDE the fence boundaries. - internal/mcp/fence_guardrail_test.go (new) — CI guardrail that walks every non-test .go file in the mcp package and fails if it finds a bare gomcp.CallToolResult literal outside tools.go. Prevents future tools from silently bypassing the fence. Delimiter-forgery defense The naive constant fence (--- UNTRUSTED MCP_RESPONSE END ---) is forgeable: an attacker who controls a field value can plant the literal end marker and "break out" of the fence. Defense: every fence call generates a 6-byte crypto/rand nonce, hex-encoded, and embeds it in BOTH the START and END markers. An attacker would need to predict the nonce (2^48 search per fence) to forge a matching END inside the payload. The delimiter_break_attempt regression test exercises this. Per-finding mapping - H-002 Cert Subject DN injection (CSR submitter controlled) → TestMCP_PromptInjection_H002_CertSubjectDN - H-003 Discovered cert metadata injection (cert owner controlled) → TestMCP_PromptInjection_H003_DiscoveredCertMetadata - M-003 Agent heartbeat injection (agent self-reports hostname/OS/IP) → TestMCP_PromptInjection_M003_AgentHeartbeat - M-004 Upstream CA error injection (CA controls error string) → TestMCP_PromptInjection_M004_UpstreamCAError - M-005 Audit details + notification body injection (downstream actors control these) → TestMCP_PromptInjection_M005_AuditDetailsAndNotifications Verification gates - go vet ./... → clean - go build ./... → clean - go test -short -count=1 ./... → all packages pass - go test -count=1 ./internal/mcp/... → all packages pass - npx tsc --noEmit (web) → clean - npx vitest run (web) → 337 passed - python3 yaml.safe_load(api/openapi.yaml) → 89 paths, 56 schemas Threat-model placement: TB-7 (MCP↔LLM consumer). certctl owns the boundary; consumer-side prompt engineering is recommended but not relied upon. Defense-in-depth: per-call nonce closes the delimiter-forgery edge case that constant fences would have left exposed. Bundle 3 of the 2026-04-25 comprehensive audit (88 findings). |
||
|
|
25c34ace45 |
feat(mcp): add claim_discovered + dismiss_discovered MCP tools (I-2 closure)
Closes the LAST P1 in the 2026-04-24 audit (cat-i-b0924b6675f8). Pre-I-2
the README claimed "all API endpoints are exposed via MCP" but the
discovered-certificate lifecycle (HTTP handlers ClaimDiscovered +
DismissDiscovered at internal/api/handler/discovery.go:125,162) had
zero MCP tool wrappers — operators using Claude / Cursor / similar
MCP clients had no path to bring an out-of-band cert under management
or to mark a benign discovery as not-of-interest without dropping to
the REST API directly. The audit's count of 0 MCP discovery tools
was correct: `grep -niE 'discover|claim|dismiss' internal/mcp/tools.go`
returned only the pre-existing agent-retire tool's description text
mentioning sentinel discovery agents — no actual discovery-tool
registrations.
Added in internal/mcp/types.go:
- ClaimDiscoveredCertificateInput (id + managed_certificate_id)
- DismissDiscoveredCertificateInput (id)
Both follow the existing Go-doc / staticcheck convention (lead with
the type name + brief; closure-rationale prose follows). Pinned by
the existing L-1 staticcheck-fix lesson.
Added in internal/mcp/tools.go (slotted at end of file, after
certctl_auth_check):
- certctl_claim_discovered_certificate — POST /api/v1/discovered-certificates/{id}/claim
- certctl_dismiss_discovered_certificate — POST /api/v1/discovered-certificates/{id}/dismiss
Both wrap the existing HTTP handlers via the generic c.Post helper.
No backend changes; no openapi.yaml changes (both ops were already
in the spec from earlier work).
The audit's third name "acknowledge" is NOT closed: at recon, no
notification-acknowledge HTTP handler exists in the API surface
(grep across internal/api/handler/ returned zero hits for
"acknowledge"). The audit appears to have mis-quoted; "acknowledge"
isn't a real backend endpoint to wrap. If a future feature adds
notification acknowledgement, register it in the same shape.
Verification:
- go build ./... — clean
- go vet ./internal/mcp/... — clean
- go test ./internal/mcp/... -count=1 — pass
- golangci-lint v2.11.4 run ./... — 0 issues
- MCP tool count went from 85 → 87 (verify via `grep -cE 'gomcp\.AddTool\(' internal/mcp/tools.go`)
- S-1 + G-3 + D-1 + D-2 + B-1 + L-1 CI guardrails all still pass
Audit findings closed:
- cat-i-b0924b6675f8 (P1, MCP discovery completeness — last P1 in audit)
This brings the audit to ZERO REMAINING P1s.
Deferred follow-ups:
- Notification acknowledge MCP tool — add when a notification-ack
HTTP handler exists. Currently no such handler exists in the
API surface; treat as a separate feature, not an MCP gap.
|
||
|
|
2edac7e78b |
fix(mcp): close staticcheck ST1021 on BulkRenew/BulkReassign input docstrings
CI on the B-1 merge (
|
||
|
|
f0865bb051 |
fix(api,web,mcp): add bulk-renew + bulk-reassign endpoints, drop client-side N×HTTP loops (L-1 master)
Two audit findings, both category cat-l, both rooted in
web/src/pages/CertificatesPage.tsx. Pre-L-1 the GUI looped per-cert
HTTP calls — 100 selected certs = 100 sequential round-trips × ~50–200
ms each = a 5–20-second wedge during which the operator stared at a
progress bar. Post-L-1 each workflow is a single POST.
cat-l-fa0c1ac07ab5 [P1, primary] — bulk renew loop
handleBulkRenewal: for/await triggerRenewal(id)
cat-l-8a1fb258a38a [P2] — bulk reassign loop
handleReassign: for/await updateCertificate(id, {owner_id})
The bulk-revoke endpoint (POST /api/v1/certificates/bulk-revoke +
BulkRevocationCriteria/Result) already existed as the canonical shape
in v2.0.x — L-1 ports that pattern to renew + reassign with per-action
twists.
Backend (Go)
- internal/domain/bulk_renewal.go: BulkRenewalCriteria mirrors
BulkRevocationCriteria (criteria + IDs modes); BulkRenewalResult
envelope adds EnqueuedJobs[] for per-cert {certificate_id, job_id};
shared BulkOperationError type for all bulk paths.
- internal/domain/bulk_reassignment.go: narrower shape — IDs-only,
owner_id required, team_id optional.
- internal/service/bulk_renewal.go::BulkRenewalService.BulkRenew:
resolves criteria → status filter (Archived/Revoked/Expired/
RenewalInProgress all silent-skip) → per-cert status flip + job
create. Keygen-mode-aware so jobs land in the same initial status
as single-cert TriggerRenewal. Single bulk audit event per call,
not N.
- internal/service/bulk_reassignment.go::BulkReassignmentService.
BulkReassign: validates owner_id upfront via the
ErrBulkReassignOwnerNotFound typed sentinel — non-existent owner
returns 400 before any cert is touched. Already-owned-by-target
is silent-skip. Single bulk audit event.
- internal/api/handler/{bulk_renewal,bulk_reassignment}.go: HTTP
shape mirrors bulk_revocation.go. NOT admin-gated (renew is non-
destructive; reassign is a common-case workflow). Sentinel-error
→ 400 mapping for OwnerNotFound.
- internal/api/router/router.go: three bulk-* routes registered as a
block before the {id} routes. HandlerRegistry gains BulkRenewal +
BulkReassignment fields.
- cmd/server/main.go: NewBulkRenewalService threads cfg.Keygen.Mode
so bulk-renew jobs land in same initial state as single-cert path.
Frontend
- web/src/api/client.ts: bulkRenewCertificates(criteria) +
bulkReassignCertificates(request) functions with full TS types.
- web/src/pages/CertificatesPage.tsx: handleBulkRenewal + handleReassign
rewritten from N-call loops to single calls. Result envelope drives
progress UI; first-error message surfaced when total_failed > 0.
Stale triggerRenewal + updateCertificate imports removed.
MCP
- internal/mcp/types.go: BulkRenewCertificatesInput +
BulkReassignCertificatesInput.
- internal/mcp/tools.go: certctl_bulk_renew_certificates +
certctl_bulk_reassign_certificates tools mirroring the existing
certctl_bulk_revoke_certificates pattern.
OpenAPI
- api/openapi.yaml: two new operations (bulkRenewCertificates,
bulkReassignCertificates) under Certificates tag. Four new schemas
(BulkRenewRequest, BulkRenewResult, BulkEnqueuedJob,
BulkReassignRequest, BulkReassignResult).
Tests
- Domain: BulkRenewalCriteria.IsEmpty + BulkReassignmentRequest.IsEmpty
IsEmpty contracts; JSON round-trip shape pinning.
- Service: 7 BulkRenew tests (happy/criteria-mode/skips-RenewalInProgress/
skips-revoked-archived/empty-criteria-error/partial-failure/
audit-event-emitted) + 8 BulkReassign tests (happy/skips-already-
owned/owner-required/empty-IDs/owner-not-found-sentinel/team-id-
optional/team-id-provided/partial-failure/audit-event-emitted).
- Handler: 5 BulkRenew handler tests (happy/empty-body-400/wrong-
method-405/actor-attribution/service-error-500) + 6 BulkReassign
handler tests (happy/empty-IDs-400/missing-owner-400/owner-not-
found-400-via-sentinel/wrong-method-405/generic-error-500).
CI guardrail
- .github/workflows/ci.yml: 'Forbidden client-side bulk-action loop
regression guard (L-1)'. Greps web/src/pages/CertificatesPage.tsx
for 'for(...) await triggerRenewal(...)' and 'for(...) await
updateCertificate(...)' patterns; comment lines exempt; test files
exempt. Verified locally (passes against post-fix tree, fires
against synthetic regression).
Counts (deltas)
- Routes: 119 → 121 (+2)
- OpenAPI operations: 123 → 125 (+2)
- MCP tools: 83 → 85 (+2)
Performance
- 100-cert bulk-renew: ~10s of sequential HTTP → ~100ms (99% latency
reduction on the canonical operator workflow).
- Audit event volume: 1 + N per operation → 1.
Out of scope (deferred follow-ups)
- cat-b-31ceb6aaa9f1: updateOwner/updateTeam/updateAgentGroup orphan
(different shape — wire existing PUT to GUI, not new bulk endpoint).
- cat-k-e85d1099b2d7: CertificatesPage no pagination UI.
- cat-i-b0924b6675f8: MCP missing claim/dismiss/acknowledge (L-1 added
2 new tools but does not close that finding).
Verification
- go build / vet / test -short / test -short -race all clean.
- web tsc --noEmit + vitest run all clean (296 tests passing).
- OpenAPI YAML parses (89 paths, 125 ops).
- L-1 CI guardrail passes against post-fix tree, fires against
synthetic regression.
No push.
|
||
|
|
52248be717 |
v2.0.47: HTTPS Everywhere — TLS-only control plane, agents/CLI/MCP
Breaking change release. Plaintext HTTP listener removed. The certctl control plane now terminates TLS 1.3 on :8443 via http.Server.ListenAndServeTLS. No CERTCTL_TLS_ENABLED=false escape hatch. No dual-listener mode. One-step cutover per docs/upgrade-to-tls.md. Server - cmd/server/tls.go: certHolder with SIGHUP hot-reload + atomic cert swap, buildServerTLSConfig (TLS 1.3 min, GetCertificate callback), preflightServerTLS validation - cmd/server/main.go: ListenAndServeTLS in place of ListenAndServe, watchSIGHUP wiring, cert/key path config threading - tls_test.go: 418-line regression coverage of reload, preflight, callback behavior, SAN validation Config - CERTCTL_TLS_CERT_PATH / CERTCTL_TLS_KEY_PATH (required) - Plaintext rejection: agents/CLI/MCP pre-flight-fail on http:// URLs with a pointer to docs/upgrade-to-tls.md Agents, CLI, MCP - All three pre-flight-reject http:// URLs with fail-loud diagnostic - CERTCTL_SERVER_CA_BUNDLE_PATH for private-CA trust - CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY for dev-only bypass (loud warning on startup) - install-agent.sh emits both vars as commented template lines docker-compose - certctl-tls-init sidecar generates SAN-valid self-signed cert into deploy/test/certs/ on first boot - All demo-stack curls pin against ca.crt with --cacert Helm chart - Three TLS provisioning modes, exactly one required: - server.tls.existingSecret (operator-supplied) - server.tls.certManager.enabled (cert-manager integration) - server.tls.selfSigned.enabled (eval only — not for production) - server-certificate.yaml template for cert-manager mode - helm install without a TLS source fails at template render with a pointer to docs/tls.md CI - .github/workflows/ci.yml Helm Chart Validation step renders the chart in both existingSecret and cert-manager modes, plus an inverse guard-regression test that asserts helm template MUST refuse to render when no TLS source is configured. Previously the single `helm template` invocation hit the certctl.tls.required fail-loud guard and exit-1'd CI. Four invocations now: lint (existingSecret), template (existingSecret), template (cert-manager), template (no args — must fail). Integration tests - deploy/test/integration_test.go stands up the Compose stack over HTTPS, extracts the CA bundle, and exercises every certctl API over https://localhost:8443 - All 34 integration subtests green (per Phase 8 local CI-parity) Documentation - New: docs/tls.md (provisioning patterns, rotation, SIGHUP reload) - New: docs/upgrade-to-tls.md (one-step cutover, no-downgrade warnings, fleet-roll sequencing) - CHANGELOG.md: v2.2.0 "HTTPS Everywhere — The Irony" entry (file heading unchanged; release tag is v2.0.47) - All curls in docs/, examples/, deploy/helm/ guides use https://localhost:8443 --cacert Verification - grep -rn "ListenAndServe[^T]" cmd/ internal/ → 0 hits - grep -rn "\"http://" cmd/ internal/ → 2 benign hits (Caddy admin API default, SSRF doc comment) — zero certctl endpoints - Tasks #197–#206 (Phases 0–8) all closed in the tracker Files: 65 changed, 3489 insertions, 372 deletions (pre-CI-fix). |
||
|
|
675b87ba63 |
I-005: notification retry loop + dead-letter queue
Critical alerts can no longer be silently dropped by a transient
notifier failure. Failed notification attempts now ride an exponential
backoff retry loop, with a 5-attempt budget before promotion to the
dead-letter queue for operator intervention.
Schema (migration 000016, idempotent):
- retry_count INTEGER NOT NULL DEFAULT 0
- next_retry_at TIMESTAMPTZ
- last_error TEXT
- idx_notification_events_retry_sweep partial index
(next_retry_at) WHERE status='failed' AND next_retry_at IS NOT NULL
Dead rows clear next_retry_at so the index stops matching them.
Service contract:
- NotificationService.RetryFailedNotifications drives 2^n-minute
exponential backoff capped at 1h (notifRetryBackoffCap) with
5-attempt budget (notifRetryMaxAttempts).
- Exhaustion (RetryCount >= notifRetryMaxAttempts-1) promotes to
status='dead' via MarkAsDead.
- Non-terminal failures record via RecordFailedAttempt.
- Success path promotes to 'sent' without touching retry_count
(audit preserves "delivered on attempt N").
- Missing-notifier branch defensively promotes to 'sent' to avoid
wedging a row on a deleted channel.
- RequeueNotification operator escape hatch atomically resets
retry_count -> 0, next_retry_at -> NULL, last_error -> NULL,
status -> pending via notifRepo.Requeue.
Scheduler:
- New always-on notificationRetryLoop wired into the base loop set at
CERTCTL_NOTIFICATION_RETRY_INTERVAL (default 2m).
- sync/atomic.Bool idempotency guard.
- sync.WaitGroup shutdown drain via WaitForCompletion.
StatsService:
- SetNotifRepo setter pattern preserves 9 pre-existing
NewStatsService call sites (main.go + stats_test.go + 8 digest
tests) without touching the constructor signature.
- DashboardSummary.NotificationsDead populated via
notifRepo.CountByStatus(ctx, "dead") — nil-safe when unwired
(reports zero on systems without a notification repository).
- CountByStatus error is non-fatal (dashboard summary is
best-effort for this field).
- Prometheus certctl_notification_dead_total counter emitted from
the same snapshot.
Handler:
- New POST /api/v1/notifications/{id}/requeue endpoint.
- dead status surfaces to MCP + CLI.
Frontend:
- NotificationsPage gains two-tab toolbar ("All" / "Dead letter")
with queryKey: ['notifications', activeTab] so switching tabs
doesn't serve stale data until the 30s refetch.
- Dead rows surface "Retry {n}/5" + truncated last_error with
full-text title tooltip.
- Requeue mutation wrapped as
mutationFn: (id: string) => requeueNotification(id)
to prevent react-query v5's positional context argument from
leaking into the API client — pinned against future refactors
by strict-match toHaveBeenCalledWith('notif-dead-001') in
NotificationsPage.test.tsx:181.
Closes I-005.
|
||
|
|
0725713e19 |
Close I-004 (agent hard-delete cascades targets) coverage-gap finding
Operator decision answered as full soft-delete with optional forced
cascade — hard-delete is not reachable from any public surface. Prior
to this commit, DELETE /agents/{id} ran a plain `DELETE FROM agents`
whose schema-level `ON DELETE CASCADE` on deployment_targets.agent_id
silently wiped every target, orphaning certs and aborting in-flight
jobs. The finding closure reshapes the agent-removal contract around
soft retirement with explicit preflight counts, an opt-in cascade
gated by a mandatory reason, and unconditional protection for the
four reserved sentinel agents used by discovery sources.
Schema — migration 000015:
migrations/000015_agent_retire.up.sql flips
deployment_targets_agent_id_fkey from ON DELETE CASCADE to ON DELETE
RESTRICT, so a stray `DELETE FROM agents` now errors at the DB
boundary instead of quietly destroying targets. Both `agents` and
`deployment_targets` grow a retired_at TIMESTAMPTZ + retired_reason
TEXT pair (TEXT not VARCHAR so operator comments are never
truncated), indexed via partial indexes WHERE retired_at IS NOT
NULL. The migration is self-healing (ADD COLUMN IF NOT EXISTS, DROP
CONSTRAINT IF EXISTS then ADD CONSTRAINT, CREATE INDEX IF NOT
EXISTS) so repeated runs against partially-migrated databases
converge. migrations/000015_agent_retire.down.sql restores CASCADE
and drops the new columns for clean rollback. A dedicated
repository-layer testcontainers test
(internal/repository/postgres/migration_000015_test.go) asserts the
before/after FK action, column presence, index presence, and
round-trip idempotency under up→down→up.
Domain — sentinel guard + dependency counts:
internal/domain/connector.go gains IsRetired() on Agent, the
exported SentinelAgentIDs slice listing server-scanner,
cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm verbatim (matching the
four reserved IDs documented in CLAUDE.md and created at startup in
cmd/server/main.go), IsSentinelAgent(id string) predicate,
AgentDependencyCounts{ActiveTargets, ActiveCertificates,
PendingJobs} with a HasDependencies() method, and ActorTypeAgent /
ActorTypeSystem enum values used by audit emission downstream.
Coverage locked down by internal/domain/connector_test.go.
Service — 8-step ordered contract:
internal/service/agent_retire.go:RetireAgent(ctx, id, actor,
opts{Force, Reason}) enforces a fixed execution order:
(1) sentinel guard — IsSentinelAgent(id) returns ErrAgentIsSentinel
unconditionally; force=true does NOT bypass it.
(2) fetch — ErrAgentNotFound on miss.
(3) idempotency — if IsRetired() already, return
AgentRetirementResult{AlreadyRetired: true} with no new audit
event and no state change (safe to replay from flaky clients).
(4) preflight counts — collectAgentDependencyCounts runs
ActiveTargets, ActiveCertificates, PendingJobs sequentially
(not in parallel; keeps the per-query timeout predictable and
matches the repo's existing call-chain shape).
(5) force-reason guard — opts.Force=true with empty Reason returns
ErrForceReasonRequired (wired into the 400 status surface).
(6) dependency guard — HasDependencies() with opts.Force=false
returns BlockedByDependenciesError{Counts} (wired into the 409
body with per-bucket counts).
(7) mutation — single pinned retiredAt := time.Now(); agent
retirement first, then cascade target retirement if opts.Force,
all under the repo's single transaction so the two retired_at
stamps match to the second.
(8) best-effort audit — agent_retired always; agent_retirement_
cascaded additionally on the force path. Actor is whatever the
handler resolves from the request; actor type is mapped by
resolveActorType (system/agent-prefix→Agent/else→User). Audit
emission failures are logged via slog.Error but do not abort
the retirement (matches the house convention used by every
other scheduler-emitted event).
BlockedByDependenciesError implements Error() as
"active_targets=%d, active_certificates=%d, pending_jobs=%d" and
Unwrap() → ErrBlockedByDependencies. The single struct satisfies
errors.Is via Unwrap (used by scheduler-level tests) and errors.As
via the concrete type (used by the handler to fish out Counts for
the 409 body). ListRetiredAgents(page, perPage) adds a separate
paginated accessor with page<1→1 and perPage<1→50 normalization so
retired rows are queryable without polluting the default agent
listing.
Sentinel guard coverage is asymmetric by design: all four reserved
IDs are protected, and force=true cannot override. Regression tests
in internal/service/agent_retire_test.go assert each of the eight
steps in order, plus sentinel bypass attempts and idempotency
replay.
Handler + router — status-code surface:
internal/api/handler/agents.go:RetireAgent exposes seven status
codes on DELETE /agents/{id}:
200 on a fresh retirement (body echoes AgentRetirementResult).
204 on idempotent replay (AlreadyRetired=true; no new audit).
400 on ErrForceReasonRequired.
403 on ErrAgentIsSentinel.
404 on ErrAgentNotFound.
409 on BlockedByDependenciesError, with a custom body shape
{error, counts{active_targets, active_certificates,
pending_jobs}} that bypasses the default ErrorWithRequestID
envelope so callers get the per-bucket numbers directly.
500 on any other error.
Heartbeat HandleHeartbeat returns 410 Gone when the agent is
retired (ErrAgentRetired), signalling the agent to shut down.
Query params `force=true` and `reason=<text>` drive the cascade
path; both are forwarded as url.Values through the new MCP
transport.
internal/api/router/router.go registers GET /api/v1/agents/retired
literal-path BEFORE /api/v1/agents/{id} — Go 1.22 ServeMux's
literal-beats-pattern-var precedence routes "retired" to the
paginated retired-agents listing instead of fetching a hypothetical
agent named "retired".
Agent binary — clean shutdown on 410:
cmd/agent/main.go gains the ErrAgentRetired sentinel, a
retiredOnce sync.Once, and a retiredSignal chan struct{}. A
markRetired(source, statusCode, body) helper closes the channel
exactly once; the Run() select loop observes the close and returns
ErrAgentRetired; main() matches via errors.Is(err, ErrAgentRetired)
and exits cleanly instead of spinning in the heartbeat retry loop.
The 410 Gone surface is therefore terminal for the agent process.
MCP transport:
internal/mcp/client.go adds Client.DeleteWithQuery(path, query),
a new additive transport method. Client.Delete is path-only; without
this method the retire tool would silently drop `force` and `reason`,
turning every cascade retire into a default soft-retire. The new
method shares do()'s 204 normalization and 4xx/5xx error
propagation so tool authors get one contract.
internal/mcp/tools.go + internal/mcp/types.go expose the
retire_agent tool with Force+Reason inputs wired through
DeleteWithQuery.
CLI:
cmd/cli/main.go + internal/cli/client.go add two CLI surfaces:
`agents list --retired` (client-side strip of --retired then
delegation to ListRetiredAgents, sharing --page/--per-page parsing
with the default listing) and `agents retire <id> [--force --reason
"…"]` (mirrors ErrForceReasonRequired — force without reason is
rejected client-side before the request is sent). JSON + table
output modes both honor the new columns.
Frontend:
web/src/pages/AgentsPage.tsx surfaces retired/retire affordances.
web/src/api/client.ts + web/src/api/types.ts expose the retire
endpoint and the retired-listing. 4 new Vitest regression cases.
OpenAPI:
api/openapi.yaml documents DELETE /agents/{id} with all seven
status codes, 410 on heartbeat, and the 409 per-bucket body shape.
Regression coverage (six new test files, all green):
internal/service/agent_retire_test.go — 8-step contract + sentinel guards
internal/api/handler/agent_retire_handler_test.go — 7-status-code surface + 410 heartbeat
internal/mcp/retire_agent_test.go — DeleteWithQuery wire-through
internal/cli/agent_retire_test.go — --retired listing + --force/--reason pairing
internal/repository/postgres/migration_000015_test.go — FK flip + columns + indexes + up↔down
internal/domain/connector_test.go — IsRetired, IsSentinelAgent, SentinelAgentIDs, HasDependencies
Files:
api/openapi.yaml — DELETE + 410 + 409 body shape
cmd/agent/main.go — ErrAgentRetired, markRetired, retiredSignal
cmd/cli/main.go — handleAgents list/get/retire dispatch
docs/architecture.md, docs/concepts.md,
docs/testing-guide.md — retirement contract narrative
internal/api/handler/agents.go — RetireAgent, status surface, 410 on heartbeat
internal/api/handler/agent_handler_test.go — extended coverage
internal/api/handler/agent_retire_handler_test.go — new
internal/api/router/router.go — /agents/retired before /agents/{id}
internal/cli/agent_retire_test.go — new
internal/cli/client.go — ListRetiredAgents + RetireAgent
internal/domain/connector.go — IsRetired, SentinelAgentIDs,
IsSentinelAgent, AgentDependencyCounts,
ActorTypeAgent/System
internal/domain/connector_test.go — new
internal/integration/lifecycle_test.go — retirement fixture
internal/mcp/client.go — DeleteWithQuery additive transport
internal/mcp/retire_agent_test.go — new
internal/mcp/tools.go, internal/mcp/types.go — retire_agent tool + Force/Reason inputs
internal/repository/interfaces.go — AgentRepository retirement methods
internal/repository/postgres/agent.go — retire + cascade target retire + counts
internal/repository/postgres/migration_000015_test.go — new
internal/service/agent.go — wire into AgentService surface
internal/service/agent_retire.go — new 8-step contract
internal/service/agent_retire_test.go — new
internal/service/deployment.go — skip retired agents
internal/service/target.go — skip retired agents
internal/service/testutil_test.go — shared mocks extended
migrations/000015_agent_retire.up.sql — new
migrations/000015_agent_retire.down.sql — new
web/src/api/client.ts, types.ts + tests — retire endpoint wiring
web/src/pages/AgentsPage.tsx — retire UI
|
||
|
|
3287e174dc |
Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
|
||
|
|
a53a4b845b |
fix(gui,api): close C-001 + C-002 — ownership + agent FK contract
C-001 — CreateCertificate was server-accepted with null owner_id,
team_id, renewal_policy_id because the GUI neither collected the fields
nor enforced them, even though the backend's ManagedCertificate schema
and handler contract treat them as required. Fix the contract at all
four layers:
- web/src/pages/CertificatesPage.tsx: replace owner_id/team_id free-
text inputs with <select> elements fed by getOwners/getTeams/
getPolicies queries; mark all three required; gate the Create
button on owner_id + team_id + renewal_policy_id being set.
- internal/api/handler/certificates.go: ValidateRequired for
owner_id, team_id, renewal_policy_id on CreateCertificate so the
handler returns HTTP 400 with the offending field name before the
service layer is reached.
- internal/mcp/types.go: drop ',omitempty' from
CreateCertificateInput.RenewalPolicyID so the MCP schema reflects
the required contract; Update inputs keep partial-update semantics.
- api/openapi.yaml: 'required: [name, common_name, renewal_policy_id,
issuer_id, owner_id, team_id]' was already present on the Create
schema; clarified DeploymentTarget.agent_id description to note the
FK contract.
C-002 — CreateTargetWizard accepted an empty or bogus agent_id and the
service inserted directly, producing a Postgres 23503 FK-violation that
bubbled out as a generic HTTP 500. The FK itself (migration 000001 line
104: agent_id TEXT NOT NULL REFERENCES agents(id)) is correct; we keep
the schema strict and add validation at three layers:
- internal/service/target.go: introduce
ErrAgentNotFound sentinel and pre-validate agent_id in
TargetService.CreateTarget — empty string returns
'agent_id is required'; a nonexistent id returns the full
'referenced agent does not exist: <id>' error. Both wrap
ErrAgentNotFound via fmt.Errorf %w so callers can use errors.Is.
- internal/api/handler/targets.go: ValidateRequired on agent_id; map
errors.Is(err, service.ErrAgentNotFound) to HTTP 400 instead of
letting it fall through to the generic 500 branch.
- internal/mcp/types.go: drop ',omitempty' from
CreateTargetInput.AgentID to match the required contract.
- web/src/pages/TargetsPage.tsx: replace the free-text Agent ID input
with a <select> populated from getAgents(); include agent in the
canProceedToReview gate so Next is disabled until an agent is
chosen.
Regression coverage (21 new subtests total):
- TestCreateCertificate_MissingRequiredField_Returns400 — 6 subtests,
one per required field, each proves the handler guard fires before
the mock service is called.
- TestCreateTarget_MissingAgentID_Returns400 — handler guard.
- TestCreateTarget_NonexistentAgent_Returns400 — pins the
ErrAgentNotFound -> 400 translation.
- TestTargetService_CreateTarget_MissingAgentID — errors.Is sentinel.
- TestTargetService_CreateTarget_NonexistentAgentID — errors.Is.
- The existing TestTargetService_CreateTarget_Success, along with
TestCreateTarget_{MissingName,MissingType,NameTooLong}_* handler
tests, were updated to seed a real agent or include agent_id in
the request body so the happy paths still run cleanly.
Gates (Phase 4):
- go build/vet/test/race: green
- go test -cover: internal/service 68.7% (gate 55%),
internal/api/handler 78.9% (gate 60%)
- golangci-lint on service+handler+mcp: 0 issues
- govulncheck: no reachable vulns
- tsc --noEmit: clean
- vitest: 223/223 passing
See cowork/certctl-coverage-gap-audit.md entries C-001 and C-002.
|
||
|
|
eef1db0f0a |
fix(policies): stop 400ing the "+ New Policy" button + add per-rule severity (D-005, D-006)
Coverage Gap Audit findings D-005 (P0) + D-006 (P1) fixed together in a
single commit because they share the same root cause — policy CRUD sending
values the backend silently rejects — and splitting them would leave a
half-working UI between commits.
## D-005 (P0): PoliciesPage dropdown 400s every Create Policy
Root cause
----------
`web/src/pages/PoliciesPage.tsx` populated the Type `<select>` from a
hardcoded `['key_algorithm', 'ownership', 'allowed_issuers', ...]` array.
The backend's `internal/api/handler/validators.go::ValidatePolicyType`
enforces the TitleCase allowlist `AllowedIssuers`, `AllowedDomains`,
`RequiredMetadata`, `AllowedEnvironments`, `RenewalLeadTime` — defined in
`internal/domain/policy.go`. Every Create Policy request was rejected with
`400 invalid policy type`. The error surfaced only as a transient toast;
the modal closed anyway. Silent user-visible failure.
Fix
---
- `web/src/api/types.ts`: added `POLICY_TYPES` and `POLICY_SEVERITIES`
tuples with `as const` and narrowed `PolicyRule.type`, `.severity`, and
`PolicyViolation.severity` to the literal-union types. Dropdown is now
sourced from the tuple; casing drift becomes a compile error.
- `web/src/pages/PoliciesPage.tsx`: rekeyed `severityStyles` /
`severityDots` to the TitleCase values, added `humanize()` for display
(AllowedIssuers → "Allowed Issuers"), removed the `badge-neutral`
fallback that was papering over the mismatch.
- `web/src/api/types.test.ts` (new): pins both tuples exactly. If anyone
edits one side of the frontend/backend contract without the other, CI
fails with a clear assertion. Pure-TS vitest, no RTL dependency.
## D-006 (P1): `severity` field silently dropped on create/update
Root cause
----------
`PolicyRule` had no `Severity` field in `internal/domain/policy.go`. The
frontend has always sent `severity` on create/update, but Go's
`json.Decoder` (default settings, no `DisallowUnknownFields`) silently
dropped it. The value never reached PostgreSQL. Every rule rendered with
the same severity because there was no severity — just a display
computation downstream.
Fix: option (b), full-stack schema add (not delete-the-field)
-------------------------------------------------------------
- Migration `000013_policy_rule_severity` (up + down): adds
`severity VARCHAR(50) NOT NULL DEFAULT 'Warning'` to `policy_rules` with
CHECK constraint `severity IN ('Warning', 'Error', 'Critical')`. No
index — three-value column on a low-thousands-rows table, planner will
seq-scan regardless. PG 11+ metadata-only ADD COLUMN, safe on live data.
- `internal/domain/policy.go`: added `Severity PolicySeverity` field.
- `internal/repository/postgres/policy.go`: plumbed `severity` through
ListRules SELECT + Scan, GetRule SELECT + Scan, CreateRule INSERT,
UpdateRule UPDATE (4 queries).
- `internal/service/policy.go::UpdatePolicy`: if the client omits
severity on a PUT (zero-value empty string), fetch the existing rule
and preserve its severity. Without this, partial updates would trip the
NOT NULL CHECK and 500. Preserves pre-existing behavior for Name/Type
(out of scope).
- `internal/api/handler/policies.go::CreatePolicy`: default empty severity
to `'Warning'`, then validate via `ValidatePolicySeverity`. 400 with
clear message instead of 500 on CHECK violation. `UpdatePolicy`:
validates severity only when provided.
- `internal/mcp/types.go` + `internal/mcp/tools.go`: added optional
`severity` on the MCP `create_policy` / `update_policy` tool inputs so
LLM callers stay in sync with the wire contract.
- `api/openapi.yaml`: added `severity` to the `PolicyRule` schema with
the enum and default.
Acceptance criterion (user-defined)
-----------------------------------
"Create a rule with severity=Critical, reload the page, and still see
Critical — no silent drops." Verified end-to-end: frontend sends
`severity: "Critical"`, handler validates, service persists, DB stores,
GET returns, React renders the correct badge.
Seed data
---------
`migrations/seed.sql`: four demo rules now have differentiated severities
— `pr-require-owner` → Warning, `pr-allowed-environments` → Error,
`pr-max-certificate-lifetime` → Critical, `pr-min-renewal-window` →
Warning. The user called out that seeding all four at the same severity
makes the feature look decorative; differentiation demonstrates the
column carries real signal.
## Integration test fix (side effect of D-006)
`internal/integration/e2e_test.go::TestCrossResourceWorkflow/CreatePolicy`
was sending `"severity": "High"` — a value from the pre-audit severity
vocabulary that the new `ValidatePolicySeverity` correctly rejects with
400. Changed to `"Error"` (closest semantic match in the new TitleCase
allowlist). Only severity reference in the integration/ directory;
verified via grep.
## Out of scope, logged for follow-up (d/D-008)
Three policy-engine drift issues orthogonal to D-005 + D-006, explicitly
deferred per direction:
1. `migrations/seed.sql` policy_rules INSERTs use lowercase TYPE values
(`'ownership'`, `'environment'`, `'lifetime'`, `'renewal_window'`).
These are load-bearing on `internal/service/policy.go::evaluateRule`'s
`switch rule.Type` (which also uses the lowercase strings). Migrating
requires coordinated changes across seed + evaluation engine.
2. `migrations/seed_demo.sql:482-483` contains lowercase `'critical'`
severity — will now fail the new CHECK constraint. Separate fix.
3. `evaluateRule` hardcodes `Severity: domain.PolicySeverityWarning` on
emitted violations and ignores the configured `rule.Config`. The new
severity column is read correctly on the CRUD path but not yet
consulted during evaluation.
## Verification
Backend:
- `go build ./...` — clean
- `go vet ./...` — clean
- `go test -short ./...` — all packages green, including
`internal/service` (policy service), `internal/api/handler` (policy +
MCP handler tests), `internal/integration` (e2e_test.go after fix),
`internal/domain`, `internal/repository/postgres`.
Frontend:
- `tsc --noEmit` — clean
- `vitest run` — 223/223 passing (4 new assertions in types.test.ts)
- `vite build` — clean (only the pre-existing chunk-size warning)
|
||
|
|
13cd4d98ba |
feat(V2.2): bulk revocation — filter-based fleet-wide certificate revocation
Add POST /api/v1/certificates/bulk-revoke with filter criteria (profile_id, owner_id, agent_id, issuer_id, team_id, certificate_ids), partial-failure tolerance, and audit trail. Includes MCP tool, CLI command (certs bulk-revoke), server-side bulk modal in GUI replacing client-side sequential loop, OpenAPI spec, compliance mapping updates, and 21 new tests (12 service, 7 handler, 1 CLI, 1 frontend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
a8fc177118 |
fix: resolve NULL csr_pem scan errors and QA smoke test failures
Root cause: certificate_versions.csr_pem is nullable in the schema but Go code scanned it into a plain string. Used sql.NullString in ListVersions and GetLatestVersion to handle NULL values correctly. Also includes: partial update fetch-merge-update pattern to prevent FK violations, nil directory guard in discovery service, diagnostic slog logging in handlers, export handler 422 for unparseable PEM, OpenAPI spec corrections, MCP tool description improvements, and test fixes. Rewrites the Release Sign-Off section in testing-guide.md to individual test-level granularity (320 rows) with smoke test results audited and checked off (121 pass, 5 skip, 194 manual remaining). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ec21c9bb29 |
feat(m28+m29+m30): ACME ARI, email digest, and Helm chart
M28: ACME Renewal Information (RFC 9702) — CA-directed renewal timing with cert ID computation, directory endpoint discovery, graceful degradation for non-ARI CAs. 19 tests. M29: Email notifier wiring + scheduled certificate digest — SMTP connector bridged to service layer via NotifierAdapter, DigestService with HTML email template, 7th scheduler loop (24h), digest preview/send API endpoints and GUI card. 21 tests. M30: Production-ready Helm chart — server Deployment, PostgreSQL StatefulSet, agent DaemonSet, ConfigMaps, Secrets, Ingress, security contexts, health probes, example values for dev/prod/ACME scenarios. Also: OpenAPI spec updates, MCP tool additions, CI helm-lint job, documentation updates across 5 doc files and README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
43a03c168c |
fix: Go 1.25 upgrade, codebase audit fixes, MCP server tests
Upgrade from Go 1.22 to 1.25 (minimum for MCP SDK, actively supported). CI updated to match. Codebase audit fixes: - Local CA parseIP() now uses net.ParseIP — IP SANs no longer silently dropped - Nil pointer guards in agent.go GetWorkWithTargets for target/cert enrichment - MCP CreateCertificateInput marks owner_id/team_id as required - NGINX connector uses CombinedOutput() — captures diagnostic output on failure - Jobs handler validates JSON decode on rejection body — returns 400 on malformed - CRL/OCSP handlers propagate requestID for error tracing MCP server tests (26 tests): - client_test.go: HTTP client coverage (GET/POST/PUT/DELETE, auth, 204, errors, binary) - tools_test.go: tool registration, pagination, end-to-end flows with mock API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
956230aec1 |
feat: M18a — MCP server exposing all 76 API endpoints as AI-native tools
Separate standalone binary (cmd/mcp-server/) using official MCP Go SDK (modelcontextprotocol/go-sdk v1.4.1) with stdio transport. Stateless HTTP proxy translates MCP tool calls to certctl REST API requests. 76 tools across 16 resource domains with typed input structs and jsonschema tags for automatic LLM-friendly schema generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |