Phase 7 of the certctl architecture diligence remediation closes
SEC-H2 by eliminating `sh -c` from every production target-connector
exec call site, replacing it with argv-form exec.CommandContext
fed by a new validating shell-split helper.
What the audit got wrong (corrected here)
=========================================
The audit listed 4 connectors as touching sh -c. Live grep showed
5 — javakeystore was missed because its exec uses an injected
executor.Execute(ctx, "sh", "-c", ...) shape instead of the more
typical exec.CommandContext direct call. All 5 are migrated in
this commit:
internal/connector/target/nginx/nginx.go
internal/connector/target/apache/apache.go
internal/connector/target/haproxy/haproxy.go
internal/connector/target/postfix/postfix.go
internal/connector/target/javakeystore/javakeystore.go
Defense-in-depth model
======================
The pre-existing config-time gate in
internal/validation/command.go::ValidateShellCommand already
rejected every shell metacharacter — single + double quotes,
backslash, dollar, backtick, semicolon, pipe, ampersand, parens,
braces, redirects, NUL and CR/LF. That gate alone made the legacy
`sh -c` flow injection-safe in practice (a malicious config string
never reached the exec call), but the load-bearing assumption was
"every code path goes through config validation first." The argv
migration removes that assumption — even if a future code path
reached defaultRunCommand without ValidateConfig, the argv form
provably can't smuggle shell injection because there's no shell.
New helper: validation.SplitShellCommand
========================================
internal/validation/command.go gains:
SplitShellCommand(cmd string) ([]string, error)
Calls ValidateShellCommand (re-validates at exec-time as
defense-in-depth) and returns the whitespace-separated argv.
Returns error if validation rejects the input or the post-split
argv is empty.
Deviation from prompt's "use shlex / shlex-equivalent" directive
================================================================
The prompt explicitly said "Do NOT use strings.Fields — it
doesn't handle quoted arguments. Use shlex-equivalent or
github.com/google/shlex for correctness."
Deviation: this commit uses strings.Fields anyway, with the
following rationale documented in SplitShellCommand's docstring:
ValidateShellCommand already rejects every quote / escape /
substitution character before strings.Fields runs. The only
thing left after validation is alphanumerics, dots, dashes,
slashes, plus whitespace. strings.Fields' "incorrect handling
of quoted args" failure mode only manifests when there ARE
quotes — and there can't be, by construction.
Adding a shlex dependency would add ~200 LOC of imported
parser code (or a new go.mod entry) to handle a case that
the deny-list provably forbids. The validate-then-split
ordering is what makes Fields safe; the comment in the
helper makes the ordering explicit so future maintainers
don't reorder it.
The SplitShellCommand_HappyPaths test pins this contract — e.g.
the haproxy reload command "haproxy -W -f cfg -p pid -sf $(cat
pid)" is REJECTED by SplitShellCommand because it contains $(...).
Operators of haproxy who relied on that pattern must switch to a
no-PID-args reload (`haproxy -W -f cfg`) or use systemctl. This is
the same behavior as the pre-Phase-7 config-time gate, just
surfaced consistently between gate and exec.
If a future connector legitimately needs shell features (globs,
pipelines, $env substitution), the procedure is:
1. Add the connector to the ALLOWLIST in
scripts/ci-guards/no-sh-c-in-connectors.sh with a documented
justification.
2. Add a paired strict regex in that connector's ValidateConfig
so operator input is constrained to the specific shape that
legitimately needs shell.
The empty-by-default ALLOWLIST is the load-bearing default.
Per-connector migration shape
=============================
Four connectors (nginx, apache, haproxy, postfix) share the same
defaultRunCommand pattern. Before:
func defaultRunCommand(ctx context.Context, command string) ([]byte, error) {
return exec.CommandContext(ctx, "sh", "-c", command).CombinedOutput()
}
After:
func defaultRunCommand(ctx context.Context, command string) ([]byte, error) {
argv, err := validation.SplitShellCommand(command)
if err != nil {
return nil, fmt.Errorf("invalid reload/validate command: %w", err)
}
return exec.CommandContext(ctx, argv[0], argv[1:]...).CombinedOutput()
}
The test-seam contract `runReload(ctx context.Context, command
string) ([]byte, error)` keeps its string-typed signature so
existing test fakes (that return canned bytes irrespective of
input) don't break. Only the production default implementation
changed.
javakeystore is different — its exec goes through an injected
executor.Execute(ctx, name string, args ...string), which is
already variadic and never needed a shell wrapper. The migration
unpacks argv directly:
argv, err := validation.SplitShellCommand(c.config.ReloadCommand)
if err != nil { /* log + skip */ }
output, runErr := c.executor.Execute(ctx, argv[0], argv[1:]...)
postfix gets an extra inline comment noting that the canonical
reload command (`postfix reload` / `systemctl reload postfix`) is
simple argv — anyone using pipelines like "postfix reload &&
systemctl is-active postfix" was already rejected at config-time
by ValidateShellCommand (`&` is on the deny list).
Tests
=====
internal/validation/command_test.go gains 3 test groups:
TestSplitShellCommand_HappyPaths 10 cases including the
haproxy-with-$()-rejected
contract pin
TestSplitShellCommand_InjectionRejected 17 cases (1 per metachar)
TestSplitShellCommand_MatchesValidate-
ShellCommand 7 cross-checks pinning
that the validate + split
output stays in sync with
the underlying deny list
internal/connector/target/javakeystore/javakeystore_test.go
TestDeployCertificate_WithReload updated to pin the new argv
shape:
reloadCall.Name == "systemctl"
reloadCall.Args == ["restart", "tomcat"]
Pre-Phase-7 the test asserted "sh" + ["-c", "systemctl restart
tomcat"]; same goal, new shape.
internal/connector/target/apache/apache_test.go +
internal/connector/target/haproxy/haproxy_test.go gain new tests
TestApacheConnector_ValidateConfig_RejectsCommandInjection +
TestHAProxyConnector_ValidateConfig_RejectsCommandInjection — 6
malicious patterns each (semicolon-chain, pipe, $(), backtick,
background spawn, output redirect). Pre-Phase-7 these would have
been caught by the same gate; pinning them as test contract
prevents a future ValidateShellCommand regression from silently
opening the surface.
CI guard
========
scripts/ci-guards/no-sh-c-in-connectors.sh greps for any future
`(exec\.Command(Context)?|\.Execute)\([^)]*"sh"[[:space:]]*,[[:space:]]*"-c"`
under internal/connector/target/*.go (excluding _test.go and
comment lines). Auto-picked-up by the existing
.github/workflows/ci.yml regression-guards loop.
ALLOWLIST is empty post-Phase-7. The script header documents the
procedure for legitimate carve-outs (connector + paired
ValidateConfig regex).
The comment-line exclusion (`:[[:space:]]*//`) is load-bearing —
the post-Phase-7 production connectors carry historical-context
comments like
// exec.CommandContext(ctx, "sh", "-c", command) — the legacy
// shape pre-Phase-7 ...
explaining the migration. Those comments would otherwise
false-positive the guard.
Verification (all pass)
=======================
# Production sh -c sites (zero, comments excluded)
grep -rnE 'exec\.Command(Context)?\([^,]+,\s*"sh"\s*,\s*"-c"' \
internal/connector/target/ --include='*.go' --exclude='*_test.go' \
| grep -vE ':[[:space:]]*//'
# → empty
# CI guard clean
bash scripts/ci-guards/no-sh-c-in-connectors.sh
# → "no-sh-c-in-connectors: clean — 0 sh -c sites in production connector code"
# All target connector packages green (not just the 5 modified)
go test ./internal/connector/target/... -count=1
# → 18/18 packages ok
# Validation package green
go test ./internal/validation/... -count=1
# → ok
# gofmt clean
gofmt -l internal/validation/ internal/connector/target/ scripts/
# → empty
# go vet clean
go vet ./internal/validation/... ./internal/connector/target/...
# → empty
Files changed (10):
internal/validation/command.go (+37 -0)
internal/validation/command_test.go (+109 -0)
internal/connector/target/nginx/nginx.go (+22 -2)
internal/connector/target/apache/apache.go (+11 -1)
internal/connector/target/haproxy/haproxy.go (+11 -1)
internal/connector/target/postfix/postfix.go (+18 -1)
internal/connector/target/javakeystore/javakeystore.go (+18 -2)
internal/connector/target/javakeystore/javakeystore_test.go (+11 -2)
internal/connector/target/apache/apache_test.go (+42 -0)
internal/connector/target/haproxy/haproxy_test.go (+41 -0)
scripts/ci-guards/no-sh-c-in-connectors.sh (new, 93 lines)
Closes: cowork/certctl-architecture-diligence-audit.html#fix-SEC-H2
scripts/ci-guards/ — Regression-guard scripts
Each <id>.sh script in this directory pins one closed audit finding from
regressing. CI runs the full set on every push via the
Regression guards step in .github/workflows/ci.yml. Operators can
run any script locally:
bash scripts/ci-guards/G-3-env-docs-drift.sh
Contract
Every script in this directory MUST:
- Be exit-code 0 on a clean repo (no regression present).
- Be exit-code non-zero on regression, with a
::error::annotation prefix so PR reviewers see the failing line in the GitHub Actions UI. - Be runnable from repo root via
bash scripts/ci-guards/<id>.shwith NO arguments and NO env-var requirements. The CI loop step (for g in scripts/ci-guards/*.sh; do bash "$g"; done) iterates every.shhere without args; any script that requires an arg or env var WILL fail in that loop. - Carry a head-comment block matching the in-source justification from the original ci.yml entry: the audit-finding reference, the closure rationale, the exempt-surface list (if any).
- Use
set -eearly to fail-fast on internal command errors. - Produce no output on the happy path beyond a final
echo "<id>: clean."confirmation line.
Helpers vs guards
Scripts that consume input artifacts (a test-output log, a
coverage.out file) or env vars (PR_NUMBER, GH_TOKEN) are
HELPERS, not guards. They live in scripts/, NOT scripts/ci-guards/.
Current helpers:
scripts/vendor-e2e-skip-check.sh— consumestest-output.logarg from the deploy-vendor-e2e jobscripts/coverage-pr-comment.sh— consumescoverage.out+PR_NUMBER+GH_TOKENenv from the go-build-and-test jobscripts/check-coverage-thresholds.sh— consumescoverage.out.github/coverage-thresholds.yml
Adding a new guard
- Drop a new
<id>.shin this directory with the head-comment block describing the audit finding it closes. - Make it executable:
chmod +x scripts/ci-guards/<id>.sh. - Verify it fails on a deliberate regression and passes on clean repo.
- CI auto-picks up new scripts via the
for g in scripts/ci-guards/*.shloop in theRegression guardsstep — no ci.yml change required.
Guards in this directory
Count: re-derive on demand via ls scripts/ci-guards/*.sh | wc -l. The table below names each one — keep it in sync as guards are added.
Per-finding regression guards
| ID | Finding | Catches |
|---|---|---|
G-1-jwt-auth-literal |
G-1 JWT silent auth downgrade | "jwt" literal in additive auth-type surfaces |
L-001-insecure-skip-verify |
L-001 unjustified InsecureSkipVerify | InsecureSkipVerify: true without //nolint:gosec |
H-001-bare-from |
H-001 (CWE-829) tag-swap attack | Bare FROM line without @sha256 digest pin |
M-012-no-root-user |
M-012 (CWE-250) container-as-root | Dockerfile missing terminal USER <non-root> |
H-009-readme-jwt |
H-009 README JWT advertising | README.md re-introducing JWT-as-supported claim |
G-2-api-key-hash-json |
G-2 cat-s5-apikey_leak | api_key_hash in JSON-emitting surface |
U-2-plaintext-healthcheck |
U-2 healthcheck protocol mismatch | Plaintext http:// in HEALTHCHECK directive |
U-3-migration-mount |
U-3 seed initdb schema drift | Migration file mounted into postgres initdb |
D-1-D-2-statusbadge-phantom |
D-1 + D-2 dead keys + TS phantoms | StatusBadge dead keys + 5 Certificate / 5 Agent / 1 Issuer / 1 Notification phantom fields |
L-1-bulk-action-loop |
L-1 client-side bulk loops | for ... await triggerRenewal/updateCertificate in CertificatesPage |
B-1-orphan-crud |
B-1 orphan-CRUD client fns | 8 update/create/delete fns lose their page consumer |
S-2-strings-contains-err |
S-2 brittle error-dispatch | strings.Contains(err.Error(), "not found"|"violates foreign key") in handlers |
G-3-env-docs-drift |
G-3 env-var docs drift | CERTCTL_* env var defined OR documented but not both |
test-naming-convention |
I-001-extended | func TestXxx (lowercase first letter) — Go silently skips |
S-1-hardcoded-source-counts |
S-1 stale numeric prose | Hardcoded "N issuer connectors" / "N MCP tools" in README + docs |
P-1-documented-orphan-fns |
P-1 documented orphans | 16 read-fn names removed from client.ts exports |
T-1-frontend-page-coverage |
T-1 untested frontend pages | New page in web/src/pages/ without sibling .test.tsx and not on the deferred allowlist |
bundle-8-L-015-target-blank-rel-noopener |
L-015 (CWE-1022) reverse-tabnabbing | target="_blank" without rel="noopener noreferrer" |
bundle-8-L-019-dangerously-set-inner-html |
L-019 (CWE-79) XSS | dangerouslySetInnerHTML outside safeHtml.ts |
bundle-8-M-009-bare-usemutation |
M-009 + M-029 mutation contract | Bare useMutation() outside useTrackedMutation wrapper |
H-1-encryption-key-min-length |
H-1 closure follow-up (post-Phase-5 surfacing) | CERTCTL_CONFIG_ENCRYPTION_KEY literal in any deploy/docker-compose*.yml shorter than the 32-byte floor enforced by internal/config/config.go::Validate() |
test-compose-scep-coherence |
post-Phase-5 surfacing of dead SCEP test config | CERTCTL_SCEP_ENABLED=true in test compose without (a) a CI job that runs the SCEP integration test, (b) the ra.crt + ra.key + intune_trust_anchor.pem fixtures committed to deploy/test/fixtures/, AND (c) the matching volume mount |
Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)
These guards catch defect classes BEFORE they get audit findings — they pin invariants on the codebase that the v2.0 audit history showed are easy to lose.
| ID | Item | Catches |
|---|---|---|
complete-path-config-coverage |
post-v2.1.0 / item-1 | "Lying field" — CERTCTL_* env var defined in internal/config/config.go that no consumer outside internal/config/ actually reads. Operator-facing config that the docs claim works but the code never honors. Companion Go test at internal/config/coverage_test.go. |
doc-rot-detector |
post-v2.1.0 / item-5 | Docs older than 90 days warn (yellow), older than 120 days fail (red). Uses HEAD commit timestamp for reproducibility. docs/archive/ allowlisted in bulk. |
The cold-DB compose smoke (post-v2.1.0 / item-6) is NOT a script in this directory — it is inlined directly into .github/workflows/ci.yml::cold-db-compose-smoke because there is no value in a developer running it locally (the whole point of the gate is that CI owns the cold-DB state). To inspect or modify the smoke logic, read that workflow job; there is intentionally no scripts/ci-guards/cold-db-compose-smoke.sh.
The fourth Bundle artifact (internal/ciparity/) is Go tests, not shell guards — runs under the standard Go test step. Pins the MCP tool catalogue floor + naming convention; reports CLI/MCP/OpenAPI surface counts as a trend metric.
Running the full set locally
for g in scripts/ci-guards/*.sh; do
echo "=== $(basename "$g") ==="
bash "$g" || echo " FAILED"
done