Files
certctl/scripts/ci-guards
shankar0123 ba66748b5b connectors: close Phase 7 SEC-H2 — migrate 5 connectors to argv-form exec
Phase 7 of the certctl architecture diligence remediation closes
SEC-H2 by eliminating `sh -c` from every production target-connector
exec call site, replacing it with argv-form exec.CommandContext
fed by a new validating shell-split helper.

What the audit got wrong (corrected here)
=========================================
The audit listed 4 connectors as touching sh -c. Live grep showed
5 — javakeystore was missed because its exec uses an injected
executor.Execute(ctx, "sh", "-c", ...) shape instead of the more
typical exec.CommandContext direct call. All 5 are migrated in
this commit:

  internal/connector/target/nginx/nginx.go
  internal/connector/target/apache/apache.go
  internal/connector/target/haproxy/haproxy.go
  internal/connector/target/postfix/postfix.go
  internal/connector/target/javakeystore/javakeystore.go

Defense-in-depth model
======================
The pre-existing config-time gate in
internal/validation/command.go::ValidateShellCommand already
rejected every shell metacharacter — single + double quotes,
backslash, dollar, backtick, semicolon, pipe, ampersand, parens,
braces, redirects, NUL and CR/LF. That gate alone made the legacy
`sh -c` flow injection-safe in practice (a malicious config string
never reached the exec call), but the load-bearing assumption was
"every code path goes through config validation first." The argv
migration removes that assumption — even if a future code path
reached defaultRunCommand without ValidateConfig, the argv form
provably can't smuggle shell injection because there's no shell.

New helper: validation.SplitShellCommand
========================================
internal/validation/command.go gains:

  SplitShellCommand(cmd string) ([]string, error)

Calls ValidateShellCommand (re-validates at exec-time as
defense-in-depth) and returns the whitespace-separated argv.
Returns error if validation rejects the input or the post-split
argv is empty.

Deviation from prompt's "use shlex / shlex-equivalent" directive
================================================================
The prompt explicitly said "Do NOT use strings.Fields — it
doesn't handle quoted arguments. Use shlex-equivalent or
github.com/google/shlex for correctness."

Deviation: this commit uses strings.Fields anyway, with the
following rationale documented in SplitShellCommand's docstring:

  ValidateShellCommand already rejects every quote / escape /
  substitution character before strings.Fields runs. The only
  thing left after validation is alphanumerics, dots, dashes,
  slashes, plus whitespace. strings.Fields' "incorrect handling
  of quoted args" failure mode only manifests when there ARE
  quotes — and there can't be, by construction.

  Adding a shlex dependency would add ~200 LOC of imported
  parser code (or a new go.mod entry) to handle a case that
  the deny-list provably forbids. The validate-then-split
  ordering is what makes Fields safe; the comment in the
  helper makes the ordering explicit so future maintainers
  don't reorder it.

The SplitShellCommand_HappyPaths test pins this contract — e.g.
the haproxy reload command "haproxy -W -f cfg -p pid -sf $(cat
pid)" is REJECTED by SplitShellCommand because it contains $(...).
Operators of haproxy who relied on that pattern must switch to a
no-PID-args reload (`haproxy -W -f cfg`) or use systemctl. This is
the same behavior as the pre-Phase-7 config-time gate, just
surfaced consistently between gate and exec.

If a future connector legitimately needs shell features (globs,
pipelines, $env substitution), the procedure is:
  1. Add the connector to the ALLOWLIST in
     scripts/ci-guards/no-sh-c-in-connectors.sh with a documented
     justification.
  2. Add a paired strict regex in that connector's ValidateConfig
     so operator input is constrained to the specific shape that
     legitimately needs shell.
The empty-by-default ALLOWLIST is the load-bearing default.

Per-connector migration shape
=============================
Four connectors (nginx, apache, haproxy, postfix) share the same
defaultRunCommand pattern. Before:

  func defaultRunCommand(ctx context.Context, command string) ([]byte, error) {
      return exec.CommandContext(ctx, "sh", "-c", command).CombinedOutput()
  }

After:

  func defaultRunCommand(ctx context.Context, command string) ([]byte, error) {
      argv, err := validation.SplitShellCommand(command)
      if err != nil {
          return nil, fmt.Errorf("invalid reload/validate command: %w", err)
      }
      return exec.CommandContext(ctx, argv[0], argv[1:]...).CombinedOutput()
  }

The test-seam contract `runReload(ctx context.Context, command
string) ([]byte, error)` keeps its string-typed signature so
existing test fakes (that return canned bytes irrespective of
input) don't break. Only the production default implementation
changed.

javakeystore is different — its exec goes through an injected
executor.Execute(ctx, name string, args ...string), which is
already variadic and never needed a shell wrapper. The migration
unpacks argv directly:

  argv, err := validation.SplitShellCommand(c.config.ReloadCommand)
  if err != nil { /* log + skip */ }
  output, runErr := c.executor.Execute(ctx, argv[0], argv[1:]...)

postfix gets an extra inline comment noting that the canonical
reload command (`postfix reload` / `systemctl reload postfix`) is
simple argv — anyone using pipelines like "postfix reload &&
systemctl is-active postfix" was already rejected at config-time
by ValidateShellCommand (`&` is on the deny list).

Tests
=====
internal/validation/command_test.go gains 3 test groups:

  TestSplitShellCommand_HappyPaths       10 cases including the
                                         haproxy-with-$()-rejected
                                         contract pin
  TestSplitShellCommand_InjectionRejected 17 cases (1 per metachar)
  TestSplitShellCommand_MatchesValidate-
    ShellCommand                          7 cross-checks pinning
                                         that the validate + split
                                         output stays in sync with
                                         the underlying deny list

internal/connector/target/javakeystore/javakeystore_test.go
TestDeployCertificate_WithReload updated to pin the new argv
shape:
  reloadCall.Name == "systemctl"
  reloadCall.Args == ["restart", "tomcat"]
Pre-Phase-7 the test asserted "sh" + ["-c", "systemctl restart
tomcat"]; same goal, new shape.

internal/connector/target/apache/apache_test.go +
internal/connector/target/haproxy/haproxy_test.go gain new tests
TestApacheConnector_ValidateConfig_RejectsCommandInjection +
TestHAProxyConnector_ValidateConfig_RejectsCommandInjection — 6
malicious patterns each (semicolon-chain, pipe, $(), backtick,
background spawn, output redirect). Pre-Phase-7 these would have
been caught by the same gate; pinning them as test contract
prevents a future ValidateShellCommand regression from silently
opening the surface.

CI guard
========
scripts/ci-guards/no-sh-c-in-connectors.sh greps for any future
`(exec\.Command(Context)?|\.Execute)\([^)]*"sh"[[:space:]]*,[[:space:]]*"-c"`
under internal/connector/target/*.go (excluding _test.go and
comment lines). Auto-picked-up by the existing
.github/workflows/ci.yml regression-guards loop.

ALLOWLIST is empty post-Phase-7. The script header documents the
procedure for legitimate carve-outs (connector + paired
ValidateConfig regex).

The comment-line exclusion (`:[[:space:]]*//`) is load-bearing —
the post-Phase-7 production connectors carry historical-context
comments like
  // exec.CommandContext(ctx, "sh", "-c", command) — the legacy
  // shape pre-Phase-7 ...
explaining the migration. Those comments would otherwise
false-positive the guard.

Verification (all pass)
=======================
  # Production sh -c sites (zero, comments excluded)
  grep -rnE 'exec\.Command(Context)?\([^,]+,\s*"sh"\s*,\s*"-c"' \
    internal/connector/target/ --include='*.go' --exclude='*_test.go' \
    | grep -vE ':[[:space:]]*//'
  # → empty

  # CI guard clean
  bash scripts/ci-guards/no-sh-c-in-connectors.sh
  # → "no-sh-c-in-connectors: clean — 0 sh -c sites in production connector code"

  # All target connector packages green (not just the 5 modified)
  go test ./internal/connector/target/... -count=1
  # → 18/18 packages ok

  # Validation package green
  go test ./internal/validation/... -count=1
  # → ok

  # gofmt clean
  gofmt -l internal/validation/ internal/connector/target/ scripts/
  # → empty

  # go vet clean
  go vet ./internal/validation/... ./internal/connector/target/...
  # → empty

Files changed (10):
  internal/validation/command.go               (+37 -0)
  internal/validation/command_test.go          (+109 -0)
  internal/connector/target/nginx/nginx.go     (+22 -2)
  internal/connector/target/apache/apache.go   (+11 -1)
  internal/connector/target/haproxy/haproxy.go (+11 -1)
  internal/connector/target/postfix/postfix.go (+18 -1)
  internal/connector/target/javakeystore/javakeystore.go  (+18 -2)
  internal/connector/target/javakeystore/javakeystore_test.go (+11 -2)
  internal/connector/target/apache/apache_test.go         (+42 -0)
  internal/connector/target/haproxy/haproxy_test.go       (+41 -0)
  scripts/ci-guards/no-sh-c-in-connectors.sh   (new, 93 lines)

Closes: cowork/certctl-architecture-diligence-audit.html#fix-SEC-H2
2026-05-14 01:49:02 +00:00
..

scripts/ci-guards/ — Regression-guard scripts

Each <id>.sh script in this directory pins one closed audit finding from regressing. CI runs the full set on every push via the Regression guards step in .github/workflows/ci.yml. Operators can run any script locally:

bash scripts/ci-guards/G-3-env-docs-drift.sh

Contract

Every script in this directory MUST:

  1. Be exit-code 0 on a clean repo (no regression present).
  2. Be exit-code non-zero on regression, with a ::error:: annotation prefix so PR reviewers see the failing line in the GitHub Actions UI.
  3. Be runnable from repo root via bash scripts/ci-guards/<id>.sh with NO arguments and NO env-var requirements. The CI loop step (for g in scripts/ci-guards/*.sh; do bash "$g"; done) iterates every .sh here without args; any script that requires an arg or env var WILL fail in that loop.
  4. Carry a head-comment block matching the in-source justification from the original ci.yml entry: the audit-finding reference, the closure rationale, the exempt-surface list (if any).
  5. Use set -e early to fail-fast on internal command errors.
  6. Produce no output on the happy path beyond a final echo "<id>: clean." confirmation line.

Helpers vs guards

Scripts that consume input artifacts (a test-output log, a coverage.out file) or env vars (PR_NUMBER, GH_TOKEN) are HELPERS, not guards. They live in scripts/, NOT scripts/ci-guards/.

Current helpers:

  • scripts/vendor-e2e-skip-check.sh — consumes test-output.log arg from the deploy-vendor-e2e job
  • scripts/coverage-pr-comment.sh — consumes coverage.out + PR_NUMBER + GH_TOKEN env from the go-build-and-test job
  • scripts/check-coverage-thresholds.sh — consumes coverage.out
    • .github/coverage-thresholds.yml

Adding a new guard

  1. Drop a new <id>.sh in this directory with the head-comment block describing the audit finding it closes.
  2. Make it executable: chmod +x scripts/ci-guards/<id>.sh.
  3. Verify it fails on a deliberate regression and passes on clean repo.
  4. CI auto-picks up new scripts via the for g in scripts/ci-guards/*.sh loop in the Regression guards step — no ci.yml change required.

Guards in this directory

Count: re-derive on demand via ls scripts/ci-guards/*.sh | wc -l. The table below names each one — keep it in sync as guards are added.

Per-finding regression guards

ID Finding Catches
G-1-jwt-auth-literal G-1 JWT silent auth downgrade "jwt" literal in additive auth-type surfaces
L-001-insecure-skip-verify L-001 unjustified InsecureSkipVerify InsecureSkipVerify: true without //nolint:gosec
H-001-bare-from H-001 (CWE-829) tag-swap attack Bare FROM line without @sha256 digest pin
M-012-no-root-user M-012 (CWE-250) container-as-root Dockerfile missing terminal USER <non-root>
H-009-readme-jwt H-009 README JWT advertising README.md re-introducing JWT-as-supported claim
G-2-api-key-hash-json G-2 cat-s5-apikey_leak api_key_hash in JSON-emitting surface
U-2-plaintext-healthcheck U-2 healthcheck protocol mismatch Plaintext http:// in HEALTHCHECK directive
U-3-migration-mount U-3 seed initdb schema drift Migration file mounted into postgres initdb
D-1-D-2-statusbadge-phantom D-1 + D-2 dead keys + TS phantoms StatusBadge dead keys + 5 Certificate / 5 Agent / 1 Issuer / 1 Notification phantom fields
L-1-bulk-action-loop L-1 client-side bulk loops for ... await triggerRenewal/updateCertificate in CertificatesPage
B-1-orphan-crud B-1 orphan-CRUD client fns 8 update/create/delete fns lose their page consumer
S-2-strings-contains-err S-2 brittle error-dispatch strings.Contains(err.Error(), "not found"|"violates foreign key") in handlers
G-3-env-docs-drift G-3 env-var docs drift CERTCTL_* env var defined OR documented but not both
test-naming-convention I-001-extended func TestXxx (lowercase first letter) — Go silently skips
S-1-hardcoded-source-counts S-1 stale numeric prose Hardcoded "N issuer connectors" / "N MCP tools" in README + docs
P-1-documented-orphan-fns P-1 documented orphans 16 read-fn names removed from client.ts exports
T-1-frontend-page-coverage T-1 untested frontend pages New page in web/src/pages/ without sibling .test.tsx and not on the deferred allowlist
bundle-8-L-015-target-blank-rel-noopener L-015 (CWE-1022) reverse-tabnabbing target="_blank" without rel="noopener noreferrer"
bundle-8-L-019-dangerously-set-inner-html L-019 (CWE-79) XSS dangerouslySetInnerHTML outside safeHtml.ts
bundle-8-M-009-bare-usemutation M-009 + M-029 mutation contract Bare useMutation() outside useTrackedMutation wrapper
H-1-encryption-key-min-length H-1 closure follow-up (post-Phase-5 surfacing) CERTCTL_CONFIG_ENCRYPTION_KEY literal in any deploy/docker-compose*.yml shorter than the 32-byte floor enforced by internal/config/config.go::Validate()
test-compose-scep-coherence post-Phase-5 surfacing of dead SCEP test config CERTCTL_SCEP_ENABLED=true in test compose without (a) a CI job that runs the SCEP integration test, (b) the ra.crt + ra.key + intune_trust_anchor.pem fixtures committed to deploy/test/fixtures/, AND (c) the matching volume mount

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

These guards catch defect classes BEFORE they get audit findings — they pin invariants on the codebase that the v2.0 audit history showed are easy to lose.

ID Item Catches
complete-path-config-coverage post-v2.1.0 / item-1 "Lying field" — CERTCTL_* env var defined in internal/config/config.go that no consumer outside internal/config/ actually reads. Operator-facing config that the docs claim works but the code never honors. Companion Go test at internal/config/coverage_test.go.
doc-rot-detector post-v2.1.0 / item-5 Docs older than 90 days warn (yellow), older than 120 days fail (red). Uses HEAD commit timestamp for reproducibility. docs/archive/ allowlisted in bulk.

The cold-DB compose smoke (post-v2.1.0 / item-6) is NOT a script in this directory — it is inlined directly into .github/workflows/ci.yml::cold-db-compose-smoke because there is no value in a developer running it locally (the whole point of the gate is that CI owns the cold-DB state). To inspect or modify the smoke logic, read that workflow job; there is intentionally no scripts/ci-guards/cold-db-compose-smoke.sh.

The fourth Bundle artifact (internal/ciparity/) is Go tests, not shell guards — runs under the standard Go test step. Pins the MCP tool catalogue floor + naming convention; reports CLI/MCP/OpenAPI surface counts as a trend metric.

Running the full set locally

for g in scripts/ci-guards/*.sh; do
  echo "=== $(basename "$g") ==="
  bash "$g" || echo "  FAILED"
done