fix(security): close BUNDLE 2 — safe first run, demo mode, agent bootstrap

Bundle 2 closure (2026-05-12 acquisition diligence audit). Closes the
"docker compose up == accidental production" hazard: pre-Bundle-2 the
base deploy/docker-compose.yml WAS the demo path (AUTH_TYPE=none +
DEMO_MODE_ACK=true + KEYGEN_MODE=server + DEMO_SEED=true + literal
change-me-... placeholder creds), the README claimed "drop the demo
overlay for a clean install", and ENVIRONMENTS.md table documented
auth-type default as api-key — three contradictory stories layered on
the same compose file.

Source findings closed:
  R2 R3 C1 D9 finding-2 S9               (repo audit)
  SEC-H2 SEC-M1 SEC-M3 OPS-M3 LOW-5 HIGH-6 (cowork audit)

Compose split (deploy/docker-compose.yml + deploy/docker-compose.demo.yml):
The base now ships production-shaped — no AUTH_TYPE override, no
KEYGEN_MODE override, no DEMO_MODE_ACK, no DEMO_SEED, no literal
placeholder fallbacks. POSTGRES_PASSWORD / CERTCTL_AUTH_SECRET /
CERTCTL_CONFIG_ENCRYPTION_KEY / CERTCTL_API_KEY / CERTCTL_AGENT_ID
must come from deploy/.env (sample template in deploy/.env.example +
root .env.example). The demo overlay carries the full demo posture
(every env var + every placeholder credential) so the
`-f docker-compose.demo.yml` one-flag flip remains a zero-config
populated-dashboard path.

Fail-closed startup guards (internal/config/config.go::Validate):
Three new gates layered on the existing HIGH-12 demo-mode listen-bind
guard. All three exempt CERTCTL_DEMO_MODE_ACK=true so the demo overlay
keeps working:
  • HIGH-6:  AUTH_SECRET = "change-me-in-production"        → refuse
  • HIGH-6:  CONFIG_ENCRYPTION_KEY = "change-me-32-char..." → refuse
  • LOW-5:   CORS_ORIGINS contains "*"  (CWE-942 + CWE-352) → refuse

Visible DEMO MODE banner (cmd/server/main.go): every boot under
DEMO_MODE_ACK=true now emits a prominent WARN line with a 6-step
production-promotion checklist. The 2026-04-19 incident (a screenshot
run that kept running for three days) drove this; the per-startup
banner makes the posture unmissable in any log scraper.

Agent enrollment doc alignment:
  • docs/reference/configuration.md L83: corrected the non-existent
    URL `POST /api/v1/agents/register` to the real route
    `POST /api/v1/agents`; added the bootstrap-token note and the
    install-agent.sh handoff sequence.
  • docs/reference/architecture.md L154: replaced "agents register
    themselves at first heartbeat" (false — cmd/agent/main.go fail-
    fasts when CERTCTL_AGENT_ID is unset) with the actual two-step
    operator-driven flow (REST or GUI registration first, returned ID
    fed to install-agent.sh second).

Tests + CI guard:
  • 9 new TestValidate_Bundle2_* cases in internal/config/config_test.go
    covering: placeholder-secret refused + demo-ack exempt; placeholder
    encryption-key refused + demo-ack exempt; real key not mistaken for
    placeholder; wildcard CORS refused + demo-ack exempt; wildcard mixed
    into a concrete allowlist still refused; concrete allowlist accepted.
  • scripts/ci-guards/B2-compose-base-no-demo-env.sh: greps the base
    compose for any of the demo-mode env vars + placeholder credentials.
    Comments stripped before checking so the narrative header in the
    base file can still reference the overlay's posture in prose.

Cold-DB CI smoke (.github/workflows/ci.yml::cold-db-compose-smoke):
Switched to layering -f docker-compose.demo.yml on top of the base —
the new production base requires real env vars the smoke doesn't have,
and the smoke's purpose (catch migration-on-cold-DB regressions + the
bootstrap-token mint path) is orthogonal to which auth posture the
boot lands in.

Receipts:
  • Current first-run truth table
        compose flag                                  → posture
        -f docker-compose.yml                          (production)
                                                       → requires .env;
                                                       fail-fasts on
                                                       missing AUTH_SECRET
                                                       / CONFIG_ENCRYPTION
                                                       _KEY / POSTGRES
                                                       _PASSWORD; agent
                                                       fail-fasts on
                                                       missing AGENT_ID
        -f docker-compose.yml -f docker-compose.demo.yml  (demo)
                                                       → zero-config;
                                                       AUTH_TYPE=none +
                                                       DEMO_MODE_ACK=true
                                                       + KEYGEN=server +
                                                       DEMO_SEED=true;
                                                       boot banner WARN
        -f docker-compose.yml -f docker-compose.dev.yml   (dev)
                                                       → base + PgAdmin
                                                       + debug logging
        -f docker-compose.test.yml                     (test, standalone)
                                                       → production-shape
                                                       posture, real CA
                                                       backends
  • Verification (PATH=/tmp/go/bin export GO* paths to /tmp):
        gofmt -l                                      # clean (no diffs)
        go vet ./internal/config ./cmd/server         # clean
        go test -short -count=1 ./internal/config/... # PASS (cumulative +
                                                       all 9 new Bundle 2
                                                       cases green)
        go test -short -count=1                       # PASS (no regression
            ./internal/connector/target/configcheck    in the Bundle 1 -
                                                       closure tests)
        go build ./cmd/server ./cmd/agent             # clean
            ./cmd/cli ./cmd/mcp-server
        bash scripts/ci-guards/B2-compose-base-no-demo-env.sh  # clean
        bash scripts/ci-guards/H-1-encryption-key-min-length.sh # clean
        bash scripts/ci-guards/G-3-env-docs-drift.sh           # clean

Remaining operator warnings (not blocking; tracked in CLAUDE.md
"Open decisions"):
  • The first `docker compose -f docker-compose.yml up -d` against a
    pre-Bundle-2 .env (placeholder values still in place) will now
    fail-fast. This is the intended posture but operators upgrading
    from v2.0.x via .env-from-old-master need to rotate before
    upgrading. The CHANGELOG note for the v2.1.0 release should
    call this out alongside Auth Bundle 2's other breaking changes.

Audit-Closes: BUNDLE-2 R2 R3 C1 D9 S9 SEC-H2 SEC-M1 SEC-M3 OPS-M3 LOW-5 HIGH-6
This commit is contained in:
shankar0123
2026-05-13 00:14:59 +00:00
parent d60a0ac297
commit a849c8b8cf
13 changed files with 645 additions and 90 deletions
+101
View File
@@ -0,0 +1,101 @@
#!/usr/bin/env bash
# scripts/ci-guards/B2-compose-base-no-demo-env.sh
#
# Bundle 2 closure (2026-05-12) — base compose must stay production-shaped.
#
# Pre-Bundle-2 the base file `deploy/docker-compose.yml` shipped with the
# demo-mode env vars baked in (CERTCTL_AUTH_TYPE=none + DEMO_MODE_ACK=true +
# KEYGEN_MODE=server + DEMO_SEED=true + literal change-me placeholder
# credentials). The README, ENVIRONMENTS.md, and operator intuition all
# said "drop the demo overlay for a clean install" — but dropping the
# overlay still produced a demo-shape stack because the demo posture
# lived in the base. The Bundle 2 closure (cowork/bundle-2-prompt.md)
# moved every demo-mode env var out of the base into the demo overlay.
#
# This guard catches any future regression that would re-introduce a
# demo-mode env var into the base file. The signals checked map 1:1 to
# the env vars the overlay now owns:
#
# CERTCTL_AUTH_TYPE: none — demo-mode synthetic admin
# CERTCTL_DEMO_MODE_ACK: "true" — the HIGH-12 bypass acknowledgment
# CERTCTL_KEYGEN_MODE: server — demo-only server-side keygen
# CERTCTL_DEMO_SEED: "true" — 180-day simulated history seeder
# change-me-in-production — literal placeholder API secret
# change-me-32-char-encryption-key — literal placeholder encryption key
#
# Per the contract documented in scripts/ci-guards/README.md:
# bare callable, no args, no env, exit 0 on clean.
set -e
GUARD_NAME="B2-compose-base-no-demo-env"
BASE="deploy/docker-compose.yml"
if [ ! -f "$BASE" ]; then
echo "${GUARD_NAME}: ${BASE} not found — refuse to skip silently."
exit 1
fi
# The patterns below match the literal Bundle-2-overlay-owned env values
# anywhere in the base compose file. Comments are excluded so the same
# strings can still appear in documentation-style # comments inside the
# file (which is exactly what we want — the base file still references
# the overlay's name and posture in its header).
# grep helpers: -E for ERE, -v '^\s*#' to drop YAML comments, -F to
# match literal strings (no regex meta-chars in sentinels). The base
# compose still has narrative-comment references to the overlay's
# posture (CERTCTL_AUTH_TYPE=none ... etc.) so we can't grep the file
# raw — strip comments first.
stripped=$(sed -E 's/^\s*#.*$//' "$BASE" \
| sed -E 's/^([^#]*)#.*$/\1/')
failed=0
# Pattern 1: CERTCTL_AUTH_TYPE: none (with the YAML "key: value" shape).
if echo "$stripped" | grep -qE '^\s*CERTCTL_AUTH_TYPE\s*:\s*none\s*$'; then
echo "::error file=${BASE}::CERTCTL_AUTH_TYPE: none belongs in deploy/docker-compose.demo.yml (the demo overlay), not the base compose. Bundle 2 closure: the base must boot production-shaped. See cowork/bundle-2-prompt.md."
failed=1
fi
# Pattern 2: CERTCTL_DEMO_MODE_ACK: "true" (the HIGH-12 bypass ACK).
if echo "$stripped" | grep -qE '^\s*CERTCTL_DEMO_MODE_ACK\s*:\s*"?true"?\s*$'; then
echo "::error file=${BASE}::CERTCTL_DEMO_MODE_ACK: \"true\" belongs in deploy/docker-compose.demo.yml. Setting it in the base disables the HIGH-12 fail-closed guard on every deploy that uses the base alone."
failed=1
fi
# Pattern 3: CERTCTL_KEYGEN_MODE: server (demo-only setting).
if echo "$stripped" | grep -qE '^\s*CERTCTL_KEYGEN_MODE\s*:\s*server\s*$'; then
echo "::error file=${BASE}::CERTCTL_KEYGEN_MODE: server belongs in deploy/docker-compose.demo.yml. Production deploys must use the code default 'agent' so private keys never leave agent infrastructure."
failed=1
fi
# Pattern 4: CERTCTL_DEMO_SEED: "true" (180-day simulated history seeder).
if echo "$stripped" | grep -qE '^\s*CERTCTL_DEMO_SEED\s*:\s*"?true"?\s*$'; then
echo "::error file=${BASE}::CERTCTL_DEMO_SEED: \"true\" belongs in deploy/docker-compose.demo.yml. The 180-day demo-history seeder must not run on the production-shaped base."
failed=1
fi
# Pattern 5+6: literal change-me placeholder credentials.
# Use fgrep against the stripped (no-comment) content so the narrative
# header in deploy/docker-compose.yml can still mention the sentinels.
if echo "$stripped" | grep -qF 'change-me-in-production'; then
echo "::error file=${BASE}::literal \"change-me-in-production\" placeholder belongs in deploy/docker-compose.demo.yml. The base compose Validate() now refuses to start with this placeholder outside demo mode."
failed=1
fi
if echo "$stripped" | grep -qF 'change-me-32-char-encryption-key'; then
echo "::error file=${BASE}::literal \"change-me-32-char-encryption-key\" placeholder belongs in deploy/docker-compose.demo.yml. The base compose Validate() now refuses to start with this placeholder outside demo mode."
failed=1
fi
if [ "$failed" -ne 0 ]; then
echo ""
echo "${GUARD_NAME}: FAILED — base compose has regressed into demo-mode territory."
echo " Move the offending env vars into deploy/docker-compose.demo.yml and"
echo " re-run: bash scripts/ci-guards/${GUARD_NAME}.sh"
exit 1
fi
echo "${GUARD_NAME}: clean."