ci: post-Phase-2-docs-overhaul cleanup of stale guards + missing config doc

CI run on the ecb8896 push surfaced two real failures rooted in the
2026-05-04 docs overhaul:

  1. G-3 env-docs-drift caught two phantom CERTCTL_* env vars I'd
     introduced in the Phase 4 follow-on connector pages
     (CERTCTL_CA_CERT_PATH_NEW in adcs.md was a placeholder I made
     up; CERTCTL_EJBCA_POLL_MAX_WAIT_SECONDS in ejbca.md does not
     exist in source). Both removed.

  2. QA-doc Part-count drift guard tried to grep
     docs/qa-test-guide.md and docs/testing-guide.md, both of which
     were renamed/deleted in Phase 2/Phase 5. The Part-count drift
     class died with testing-guide.md (Phase 5 prune dispersed its
     content); the seed-count drift class is still live but pointed
     at the wrong path.

Fixes:

- Removed the QA-doc Part-count drift guard from ci.yml (premise
  dead) plus its standalone scripts/qa-doc-part-count.sh peer.
- Retargeted the QA-doc seed-count drift guard from
  docs/qa-test-guide.md → docs/contributor/qa-test-suite.md (the
  Phase 2 target). Updated both ci.yml inline copy and
  scripts/qa-doc-seed-count.sh.
- Updated Makefile qa-stats: target to drop the testing-guide.md
  Parts metric (file is gone).
- Updated Makefile verify-docs: target to drop the part-count step.

G-3 was also failing in the second direction (env vars defined in
config.go but never documented anywhere). 16 vars surfaced —
features.md (deleted Phase 6) and testing-guide.md (deleted Phase 5)
had been their canonical home. Created
docs/reference/configuration.md as the new home: a compact
operator-facing env-var reference covering scheduler intervals, job
lifecycle, rate limiting, audit, deploy verify, database,
agent-side, and SCEP profile binding. Added to docs/README.md
Reference table.

Doc-side updates to qa-test-suite.md to reframe its references to
the deleted testing-guide.md (it's now self-contained: the
Part-by-Part Coverage Map IS the canonical Part inventory).

Cosmetic comment-only updates in ci.yml + scripts/ci-guards/*.sh +
scripts/dev-setup.sh to point at the new audience-organized doc
paths (docs/operator/security.md, docs/operator/tls.md,
docs/reference/architecture.md, etc.) instead of the pre-Phase-2
flat layout.

Verified: all 24 ci-guards/*.sh pass locally; qa-doc-seed-count.sh
clean. Net diff: 178 additions / 112 deletions across 13 files.
One file deleted (qa-doc-part-count.sh) and one file added
(docs/reference/configuration.md).
This commit is contained in:
shankar0123
2026-05-05 04:56:26 +00:00
parent ecb8896b1c
commit 3275f9f1e0
14 changed files with 178 additions and 112 deletions
+98
View File
@@ -0,0 +1,98 @@
# Configuration Reference
> Last reviewed: 2026-05-05
Compact reference for `CERTCTL_*` environment variables consumed by
`certctl-server` and `certctl-agent`. Most operators don't need to
touch these — defaults are tuned for the common case. Reach for them
when the system's behaviour needs tuning beyond what's exposed in the
GUI / API.
This page enumerates the operator-tunable knobs that don't have a
dedicated home elsewhere. Connector-specific env vars are documented
on the per-connector pages under
[`docs/reference/connectors/`](connectors/index.md). Protocol env
vars (ACME server, EST, SCEP) are documented under
[`docs/reference/protocols/`](protocols/). TLS env vars are
documented in [`docs/operator/tls.md`](../operator/tls.md).
## Scheduler intervals
The scheduler runs N background loops; intervals are tunable for
performance / contention tuning.
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_SCHEDULER_AGENT_HEALTH_CHECK_INTERVAL` | `2m` | How often the agent-health loop scans for stale heartbeats and transitions agents to `Unhealthy` / `Offline`. |
| `CERTCTL_SCHEDULER_JOB_PROCESSOR_INTERVAL` | `30s` | How often the job-processor loop dispatches `Pending` jobs to agents. |
| `CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL` | `1m` | How often the notification-dispatcher loop fans out queued alerts to channels. |
| `CERTCTL_SHORT_LIVED_EXPIRY_CHECK_INTERVAL` | `5m` | How often the short-lived-expiry loop watches certs whose TTL is less than 1h for imminent expiry. |
For the full scheduler topology (12 loops, 8 always-on + 4 opt-in)
see [`architecture.md`](architecture.md) "Scheduler topology".
## Job lifecycle
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_JOB_AWAITING_CSR_TIMEOUT` | `24h` | How long a job stays in `AwaitingCSR` before the scheduler marks it `Failed` (the agent never picked it up). |
## Rate limiting
The control plane API is rate-limited by default; tune for
high-volume environments (mass-rotation events, bulk imports).
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_RATE_LIMIT_ENABLED` | `true` | Master toggle. Disable only for trusted-network single-tenant deploys where the API is firewall-protected. |
| `CERTCTL_RATE_LIMIT_PER_USER_RPS` | `0` (= use global default) | Per-user requests-per-second cap. Zero opts each user into the global default in `internal/api/middleware`. |
| `CERTCTL_RATE_LIMIT_PER_USER_BURST` | `0` (= use global default) | Per-user token-bucket burst size. Same opt-in semantics. |
## Audit trail
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_AUDIT_FLUSH_TIMEOUT_SECONDS` | `30` | How long the audit-event flush worker waits for the buffered batch to drain before forcing a flush at shutdown. |
## Deploy verification
The deploy-hardening primitive wraps every cert deploy in
atomic-write + post-verify + rollback. These env vars tune the
post-deploy TLS verification phase.
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_VERIFY_DEPLOYMENT` | `true` | Master toggle for post-deploy TLS verify. Disable only for connectors / environments where the verify endpoint is not reachable from the agent. |
| `CERTCTL_VERIFY_DELAY` | `2s` | How long to wait after the reload command completes before the first verify-handshake attempt (gives the daemon time to pick up new keys). |
| `CERTCTL_VERIFY_TIMEOUT` | `10s` | Per-attempt TLS-handshake timeout. |
| `CERTCTL_DEPLOY_BACKUP_RETENTION` | `3` | How many `.certctl-bak.<unix-nanos>.<ext>` rollback snapshots to keep per target after a successful deploy. `0` uses the default of 3; `-1` opts out of pruning entirely. |
For the full deploy contract see
[`deployment-model.md`](deployment-model.md).
## Database
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_DATABASE_MIGRATIONS_PATH` | `./migrations` | Filesystem path to the `*.up.sql` / `*.down.sql` migration set. Override only when running `certctl-server` from a non-standard layout. |
## Agent
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_AGENT_ID` | (none — required) | The agent's unique ID, issued by `POST /api/v1/agents/register` and bundled into the agent's registration response. Pass via this env var when the agent runs as a systemd unit / container without the `-agent-id` CLI flag. |
## SCEP profile binding (single-profile back-compat)
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_SCEP_PROFILE_ID` | (empty) | Optional certificate profile ID for the legacy single-profile SCEP path. The multi-profile path uses `CERTCTL_SCEP_PROFILES=<list>` + `CERTCTL_SCEP_PROFILE_<NAME>_PROFILE_ID` instead — see [`scep-server.md`](protocols/scep-server.md). |
## Related references
- [`architecture.md`](architecture.md) — scheduler topology, system design, security model
- [`deployment-model.md`](deployment-model.md) — atomic write + verify + rollback contract
- [`operator/security.md`](../operator/security.md) — full security posture (auth, rate limits, encryption at rest)
- [`operator/tls.md`](../operator/tls.md) — control-plane TLS env vars
- Per-connector pages under [`reference/connectors/`](connectors/index.md) for connector-specific config
- Per-protocol pages under [`reference/protocols/`](protocols/) for ACME / SCEP / EST / CRL+OCSP / async-CA polling