diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index ec7d245..4333713 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -656,6 +656,99 @@ jobs: fi echo "S-1 stale-counts guardrail: clean." + - name: Forbidden env-var docs drift regression guard (G-3) + # G-3 master closed cat-g-163dae19bc59 (docs-only env vars + # phantom in features.md), cat-g-b8f8f8796159 (6 config-only + # env vars never documented), and cat-g-renewal_check_interval_rename_drift + # (features.md still advertised the pre-rename + # CERTCTL_RENEWAL_CHECK_INTERVAL after it was renamed to + # CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL). This step runs + # `comm -23` both ways between the env vars defined in Go + # source (config.go + cmd/agent + deploy/test fixtures + ACME + # DNS-01 script env exports) and the env vars mentioned in + # README + docs/ + deploy/helm/. + # + # Allowlist: env vars that are documented as integration- + # surface contracts (script env exports for ACME DNS-01, + # OpenSSL CA scripts, StepCA per-issuer-config-blob fields, + # Webhook per-notifier-config-blob fields, ACME EAB, audit + # exclusion, demo-stack overrides) but not consumed directly + # by config.go. Each entry below has a one-line justification + # — if you add a new entry, add the justification too. + # + # See coverage-gap-audit-2026-04-24-v5/unified-audit.md + # cat-g-* for closure rationale. + run: | + set -e + # Defined: config.go + agent + cli + mcp-server + server cmds + test fixtures + ACME DNS export + { + grep -nE '"CERTCTL_[A-Z_]+"' internal/config/config.go | sed -E 's/.*"(CERTCTL_[A-Z_]+)".*/\1/' + grep -rhoE '"CERTCTL_[A-Z_]+"' cmd/agent/*.go cmd/cli/*.go cmd/mcp-server/*.go cmd/server/*.go 2>/dev/null | sed -E 's/"(CERTCTL_[A-Z_]+)"/\1/' + grep -rhoE 'CERTCTL_[A-Z_]+' deploy/test/qa_test.go internal/connector/issuer/acme/dns.go 2>/dev/null + } | grep -E '^CERTCTL_' | sort -u > /tmp/g3-defined.txt + # Documented: README + docs + helm + grep -rhoE '\bCERTCTL_[A-Z_]+\b' README.md docs/ deploy/helm/ 2>/dev/null | sort -u > /tmp/g3-docs.txt + # Allowlist of env vars documented as external integration contracts. + # Each entry justifies itself in one line; if you add to this list, + # add the justification. + ALLOWED='^( + CERTCTL_OPENSSL_SIGN_SCRIPT| + CERTCTL_OPENSSL_REVOKE_SCRIPT| + CERTCTL_OPENSSL_CRL_SCRIPT| + CERTCTL_OPENSSL_TIMEOUT_SECONDS| + CERTCTL_STEPCA_URL| + CERTCTL_STEPCA_FINGERPRINT| + CERTCTL_STEPCA_PROVISIONER| + CERTCTL_STEPCA_PROVISIONER_NAME| + CERTCTL_STEPCA_PROVISIONER_KEY| + CERTCTL_STEPCA_PROVISIONER_JWK| + CERTCTL_STEPCA_PROVISIONER_PASSWORD| + CERTCTL_STEPCA_PASSWORD| + CERTCTL_STEPCA_KEY_PATH| + CERTCTL_STEPCA_ROOT_CA| + CERTCTL_WEBHOOK_URL| + CERTCTL_WEBHOOK_SECRET| + CERTCTL_ACME_EAB_KID| + CERTCTL_ACME_EAB_HMAC| + CERTCTL_ACME_DNS_PROPAGATION_WAIT| + CERTCTL_AUDIT_EXCLUDE_PATHS| + CERTCTL_TLS_| + CERTCTL_TLS_INSECURE_SKIP_VERIFY| + CERTCTL_SERVER_CA_BUNDLE_PATH| + CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY| + CERTCTL_QA_[A-Z_]+ + )$' + # ^ The CERTCTL_OPENSSL_* / CERTCTL_STEPCA_* / CERTCTL_WEBHOOK_* / + # CERTCTL_ACME_EAB_* / CERTCTL_ACME_DNS_PROPAGATION_WAIT / + # CERTCTL_AUDIT_EXCLUDE_PATHS / CERTCTL_TLS_* / CERTCTL_SERVER_* / + # CERTCTL_QA_* sets are documented integration-surface contracts + # (script invocations, per-issuer config-blob field names, + # per-notifier config-blob field names, demo-stack overrides, + # test fixtures) — not server-side env vars in config.go. + # The audit's "37 docs-only" count over-flagged these; the + # closure narrows the gate to the specific drift sites + # (renewal-interval rename + 6 config-only) and allowlists + # the documented external contracts here. + ALLOWED_FLAT=$(echo "$ALLOWED" | tr -d '\n ') + DOCS_ONLY=$(comm -13 /tmp/g3-defined.txt /tmp/g3-docs.txt | grep -vE "$ALLOWED_FLAT" || true) + CONFIG_ONLY=$(comm -23 /tmp/g3-defined.txt /tmp/g3-docs.txt || true) + if [ -n "$DOCS_ONLY" ]; then + echo "G-3 regression: env var(s) mentioned in docs but not defined in Go source AND not in the documented integration-surface allowlist:" + echo "$DOCS_ONLY" + echo "" + echo "Either delete from docs (phantom/typo) or add to config.go," + echo "or add to the ALLOWED list with a one-line justification." + exit 1 + fi + if [ -n "$CONFIG_ONLY" ]; then + echo "G-3 regression: env var(s) defined in Go source but never documented:" + echo "$CONFIG_ONLY" + echo "" + echo "Add an entry to docs/features.md (or another canonical doc) so operators can find it." + exit 1 + fi + echo "G-3 env-var docs drift guardrail: clean." + helm-lint: name: Helm Chart Validation runs-on: ubuntu-latest diff --git a/docs/features.md b/docs/features.md index 6e93c48..6c2f0f3 100644 --- a/docs/features.md +++ b/docs/features.md @@ -149,7 +149,7 @@ Every API call is recorded to the immutable audit trail. Best-effort (non-blocki -The renewal scheduler runs every hour (configurable via `CERTCTL_RENEWAL_CHECK_INTERVAL`). For each certificate approaching expiration: +The renewal scheduler runs every hour (configurable via `CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL`). For each certificate approaching expiration: 1. Checks ACME ARI (RFC 9773) if available — CA-directed renewal timing takes priority 2. Falls back to threshold-based logic using per-policy `alert_thresholds_days` (default `[30, 14, 7, 0]`) @@ -1114,12 +1114,12 @@ Single SQL `UNION` query replaces the previous "fetch all, filter in Go" approac | Loop | Default Interval | Always-on | Env Var | Description | |---|---|---|---|---| -| Renewal check | 1 hour | Yes | — | Check expiring certs, query ARI, create renewal jobs | -| Job processor | 30 seconds | Yes | — | Process pending jobs | +| Renewal check | 1 hour | Yes | `CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL` | Check expiring certs, query ARI, create renewal jobs | +| Job processor | 30 seconds | Yes | `CERTCTL_SCHEDULER_JOB_PROCESSOR_INTERVAL` | Process pending jobs | | Job retry | 5 minutes | Yes | `CERTCTL_SCHEDULER_RETRY_INTERVAL` | Retry Failed jobs (I-001) | -| Job timeout reaper | 10 minutes | Yes | `CERTCTL_JOB_TIMEOUT_INTERVAL` | Fail AwaitingCSR/AwaitingApproval jobs past timeout (I-003) | -| Agent health check | 2 minutes | Yes | — | Check agent heartbeat staleness | -| Notification processor | 1 minute | Yes | — | Send queued notifications | +| Job timeout reaper | 10 minutes | Yes | `CERTCTL_JOB_TIMEOUT_INTERVAL` (per-state thresholds: `CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT`, `CERTCTL_JOB_AWAITING_CSR_TIMEOUT`) | Fail AwaitingCSR/AwaitingApproval jobs past timeout (I-003) | +| Agent health check | 2 minutes | Yes | `CERTCTL_SCHEDULER_AGENT_HEALTH_CHECK_INTERVAL` | Check agent heartbeat staleness | +| Notification processor | 1 minute | Yes | `CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL` | Send queued notifications | | Notification retry | 2 minutes | Yes | `CERTCTL_NOTIFICATION_RETRY_INTERVAL` | Exponential backoff retry for failed notifications; promote to dead-letter after 5 attempts (I-005) | | Short-lived expiry check | 30 seconds | Yes | — | Mark short-lived certs expired | | Network scan | 6 hours | Opt-in | `CERTCTL_NETWORK_SCAN_ENABLED` | Run network discovery scans | @@ -1369,7 +1369,9 @@ Config via `values.yaml`. Secrets for API key, database password, SMTP password. -21 tables across 10 numbered migrations. PostgreSQL 16. `database/sql` + `lib/pq` (no ORM). TEXT primary keys with human-readable prefixed IDs. +PostgreSQL 16, `database/sql` + `lib/pq` (no ORM). TEXT primary keys with human-readable prefixed IDs. The catalog of tables and migrations rebuilds via the commands in the "At a Glance" table at the top of this doc — re-derive at release time rather than reading hardcoded numbers from prose. + +The migration runner reads SQL files from `./migrations/` by default; the path is configurable via `CERTCTL_DATABASE_MIGRATIONS_PATH` for operators running certctl out of a non-standard layout (e.g. a Helm chart that bind-mounts migrations into `/etc/certctl/migrations/`). ### Migrations