ci(guards): fix G-3 (CERTCTL_MCP_READ_ONLY phantom) + S-1 (hardcoded 45)

Two CI guards tripped on the B4 + B5 closure commits: 1. G-3 env-docs-drift caught `CERTCTL_MCP_READ_ONLY` mentioned in docs/operator/security-bundle-5-audit-closure.md (Bundle 5 S8 row) without a corresponding entry in internal/config/config.go. The env var is a v3 idea, not a shipped feature — the doc now describes the future gate without naming the literal env var, matching the G-3 phantom-env-var contract. 2. S-1 hardcoded-source-counts caught "all 45 migrations" in docs/operator/scheduler-ha.md (Bundle 4 D8 closure prose). Per the CLAUDE.md operating rule "Numeric claims about current state rot", swapped the literal count for the rebuild command `ls migrations/*.up.sql | wc -l`. Both fixes are doc-only — no code change, no test change. The underlying Bundle 4 + Bundle 5 closures stand. Verification: bash scripts/ci-guards/G-3-env-docs-drift.sh # clean bash scripts/ci-guards/S-1-hardcoded-source-counts.sh # clean
2026-07-26 13:58:13 +00:00 · 2026-05-13 01:24:06 +00:00
parent 596e675ec7
commit 264015059d
2 changed files with 2 additions and 2 deletions
@@ -41,7 +41,7 @@ The 15 loops live in `internal/scheduler/scheduler.go`. Each is a `func (s *Sche
 Bundle 4 does NOT introduce leader election. It introduces:

 1. **Documented HA truth table** (this page) — operators know exactly which loops are safe to multi-replica and which produce operator-observable duplicates.
-2. **Migration HA** via `pg_advisory_lock` + `schema_migrations` audit table (see `internal/repository/postgres/db.go::RunMigrations`). Pre-Bundle-4 every replica race-ran all 45 migrations on boot. Post-Bundle-4 the first replica acquires the lock, applies migrations, populates `schema_migrations`, releases the lock. Subsequent replicas block at the lock, then observe the audit table and skip every already-applied file.
+2. **Migration HA** via `pg_advisory_lock` + `schema_migrations` audit table (see `internal/repository/postgres/db.go::RunMigrations`). Pre-Bundle-4 every replica race-ran the full migrations directory on boot (count via `ls migrations/*.up.sql | wc -l`). Post-Bundle-4 the first replica acquires the lock, applies migrations, populates `schema_migrations`, releases the lock. Subsequent replicas block at the lock, then observe the audit table and skip every already-applied file.
 3. **Rate-limit scope statement** at `docs/operator/rate-limit-scope.md` — process-local per-replica, restart-safe.

 ## What Bundle 4 does NOT close (deferred, tracked in WORKSPACE-ROADMAP.md)
@@ -14,7 +14,7 @@ Closure summary for Bundle 5 of the 2026-05-12 acquisition diligence audit — t
 | **S3** | Med | Named API keys parsed but validation requires `Secret` | **Operator decision** | `CERTCTL_API_KEYS_NAMED` is parsed into `cfg.Auth.NamedKeys` at startup. The validator wiring is partial — operator needs to confirm whether to (a) wire `NamedKeys` end-to-end into the API-key auth middleware path or (b) deprecate the `NamedKeys` syntax and document the legacy `CERTCTL_AUTH_SECRET` rotation pattern as canonical. v3 work item. |
 | **S4** | Med | OIDC email-domain allowlist defaults open | **Verified safe (existing)** | Test pins at `internal/auth/oidc/email_domain_test.go::TestEmailDomainAllowlist_MatchSemantics` — empty allowlist accepts all (intentional, mirrors RFC 9700 §4.1.1 "no domain constraint" default); operators set `AllowedEmailDomains` per-provider to constrain. `ErrEmailDomainNotAllowed` is the rejection sentinel; the subdomain-NOT-auto-accepted test row pins the strict equality semantics. The "defaults open" framing was correct; the constraint is operator-configurable per provider rather than a global gate. |
 | **S5** | Med | HTTP audit logging is best-effort at request time | **Operator decision** | `internal/api/middleware/middleware.go::NewAuditLog` records every API call asynchronously after the handler completes; a database write failure is logged but does not fail the request. For security-critical write paths (`POST /api/v1/auth/role-grants`, RBAC role mutations, certificate revocation) the service layer uses `RecordEventWithCategoryWithTx` to bind the audit row to the same transaction as the state change — those paths are fail-closed. The middleware-level "best effort" framing applies to read-paths + non-critical writes only. Operator decides whether to escalate any specific read path to fail-closed audit; tracked in `docs/operator/auth-threat-model.md`. |
-| **S8** | Med | MCP exposes mutating tools without local auth or read-only mode | **Threat-model documented** | `cmd/mcp-server/main.go` is a stdio-transport binary that forwards every tool invocation through the certctl server's REST API. Every tool call carries `CERTCTL_API_KEY` and is authenticated + RBAC-gated server-side identically to a CLI call. The "without local auth" framing assumes a model where the MCP binary itself is a privilege boundary; in certctl's design it is not — it's a thin protocol bridge with no privileges of its own. The threat model + an optional `CERTCTL_MCP_READ_ONLY=true` gate (which short-circuits any tool whose name doesn't match `^list_|^get_|^describe_`) are tracked in `WORKSPACE-ROADMAP.md` as a v3 hardening item. |
+| **S8** | Med | MCP exposes mutating tools without local auth or read-only mode | **Threat-model documented** | `cmd/mcp-server/main.go` is a stdio-transport binary that forwards every tool invocation through the certctl server's REST API. Every tool call carries the operator-supplied MCP API key and is authenticated + RBAC-gated server-side identically to a CLI call. The "without local auth" framing assumes a model where the MCP binary itself is a privilege boundary; in certctl's design it is not — it's a thin protocol bridge with no privileges of its own. The threat model + an optional read-only env-var gate (which would short-circuit any tool whose name doesn't match `^list_|^get_|^describe_`) is tracked in `WORKSPACE-ROADMAP.md` as a v3 hardening item; the env var itself is not yet defined in `internal/config/config.go`. |
 | **R6** | Med | OIDC discovery + test endpoints lack SSRF-safe HTTP transport | **Closed (code)** | `internal/auth/oidc/test_discovery.go::jwksReachable` now uses an `http.Client` whose transport wraps `validation.SafeHTTPDialContext(oidcOutboundTimeout)`. Pre-Bundle-5 the probe used `http.DefaultClient` — a JWKS URI pointing at `169.254.169.254` could pivot into instance metadata. Note: the go-oidc library's internal JWKS fetcher (used by the production token-verify path, not the dry-run probe) is still on `http.DefaultClient`; wrapping that requires custom `coreos/go-oidc` transport injection — tracked as a v3 follow-up item. |
 | **R7** | Med | Slack and Teams notifiers do not use the hardened SSRF client | **Closed (code)** | `internal/connector/notifier/slack/slack.go::New` and `internal/connector/notifier/teams/teams.go::New` both build their `http.Client` with `validation.SafeHTTPDialContext`. Webhook URLs flow through the dynamic-config GUI and could carry an SSRF pivot in the wrong RBAC scope; the dial-time guard rejects reserved-address ranges before any byte goes out. Mirrors the existing `internal/connector/notifier/webhook` hardening. |
 | **SEC-H1** | High | 4 open CRIT items from 2026-05-10 auth audit block v2.1.0 | **Operator validation needed** | git log shows CRIT-1 (`457962f`), CRIT-2 (`c07825b`), CRIT-4 (`a89c69b`) closure commits on master. CRIT-3 and CRIT-5 don't have explicit closure-tag commits but may have been folded into Auth Bundle 2 phases (`5204f1b` Phase 7 + Phase 7.5 covers break-glass + OIDC-first-admin). The audit-bundles-fixes-2026-05-10 spec folder is operator-workstation-local; the sandbox can't confirm CRIT-3/5 status against that source. Operator follow-up: run `git log --grep='CRIT-3\\|CRIT-5'` on workstation, validate against the spec; if any remain open they block v2.1.0 tag (per CLAUDE.md `v2.1.0 gate`). |