mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 18:01:37 +00:00
docs: remove audit-bundle-flavored docs from public repo
Three docs added in Bundle 4 + Bundle 5 closure commits (709e1c9,36840dd) were framed around acquisition-diligence audit findings and don't belong in the public-facing operator docs tree: - docs/operator/scheduler-ha.md (Bundle 4 D2 per-loop HA truth table) - docs/operator/rate-limit-scope.md (Bundle 4 D3 scope statement) - docs/operator/security-bundle-5-audit-closure.md (Bundle 5 closure receipt) Audit-bundle artifacts live in the operator's local cowork/ scratchpad, not in docs/. The underlying code closures (advisory-lock migrations, SSRF-guarded notifier transports, break-glass login limiter, MCP gating, etc.) stand — only the audit-framed documentation surface is removed. docs/README.md: drop the two table rows that pointed at the now-deleted scheduler-ha.md + rate-limit-scope.md (added in709e1c9, lines 77-78).
This commit is contained in:
@@ -74,8 +74,6 @@ You're running certctl in production and need operational guidance.
|
||||
| [Helm deployment](operator/helm-deployment.md) | Kubernetes installation via the bundled chart |
|
||||
| [Performance baselines](operator/performance-baselines.md) | Operator-runnable benchmarks for regression spot checks |
|
||||
| [Auth benchmarks](operator/auth-benchmarks.md) | Session + OIDC validation p99 targets and measured baselines |
|
||||
| [Scheduler HA semantics](operator/scheduler-ha.md) | Per-loop HA truth table for the 15 scheduler loops; what duplicates on multi-replica |
|
||||
| [Rate-limit scope](operator/rate-limit-scope.md) | Process-local vs cluster-wide rate-limit behavior, restart semantics, multi-replica mental math |
|
||||
| [Legacy clients (TLS 1.2)](operator/legacy-clients-tls-1.2.md) | Reverse-proxy runbook for embedded EST/SCEP clients on TLS 1.2 |
|
||||
|
||||
### Runbooks
|
||||
|
||||
@@ -1,54 +0,0 @@
|
||||
# Rate-Limit Scope
|
||||
|
||||
> Last reviewed: 2026-05-13
|
||||
|
||||
How certctl's rate limiters behave under multi-replica and restart, and where the boundaries are. Closes Bundle 4 audit findings **MED-1** (process-local rate limits) and **MED-2** (rate-limit semantics across replicas).
|
||||
|
||||
## TL;DR
|
||||
|
||||
Every rate limiter in certctl is **process-local**: in-memory `sync.Mutex`-guarded maps in the `internal/ratelimit` package. The effective rate-limit across an N-replica deployment is **N × the configured per-replica limit**. Limiter state is lost on restart — no persistent ledger, no shared store. This is intentional for v2.1.0 and documented; shared rate limits across replicas (via Redis or a Postgres-backed token bucket) is a v3 work item tracked in `WORKSPACE-ROADMAP.md`.
|
||||
|
||||
## Limiter inventory
|
||||
|
||||
The shared primitive lives at `internal/ratelimit/sliding_window.go::SlidingWindowLimiter`. It's a sliding-window log algorithm (each key holds timestamps within the configured window; on `Allow` the bucket prunes timestamps older than `now - window` and admits or rejects based on the post-prune count).
|
||||
|
||||
Five call sites exercise it across `cmd/server/main.go`:
|
||||
|
||||
| Call site | Key | Window | Cap | What it protects |
|
||||
|---|---|---|---|---|
|
||||
| `ocspLimiter` | source IP | 1 minute | `CERTCTL_OCSP_RATE_LIMIT_PER_IP_MIN` (default 1000) | RFC 6960 OCSP responder against scan amplification. |
|
||||
| `exportLimiter` | actor ID | 1 hour | `CERTCTL_CERT_EXPORT_RATE_LIMIT_PER_ACTOR_HR` (default 50) | `/api/v1/certificates/{id}/export` bulk-cert-pull abuse. |
|
||||
| EST per-principal | CN | 24 hours | per-profile `RateLimitPerPrincipal24h` | EST RFC 7030 device enrollment abuse. |
|
||||
| EST failed-auth | source IP | 1 hour | 10 attempts | EST HTTP-Basic brute force. |
|
||||
| Intune dispatcher | (Subject, Issuer) | 24 hours | per-profile `PerDeviceRateLimit24h` | SCEP + Intune duplicate-enrollment cap. |
|
||||
|
||||
The HTTP middleware rate-limiter (`internal/api/middleware/middleware.go::RateLimitConfig`, knobs `CERTCTL_RATE_LIMIT_RPS` + `CERTCTL_RATE_LIMIT_BURST`) is a separate token-bucket implementation but follows the same process-local scope.
|
||||
|
||||
## What is in scope
|
||||
|
||||
- **Per-process abuse mitigation**. A scanner hitting one replica's OCSP responder at 5000 rps gets dropped to 1000 rps by `ocspLimiter`. A compromised API key trying to bulk-export 1000 certs in an hour against one replica gets dropped to 50.
|
||||
- **Bounded memory footprint**. Each limiter caps its key-tracking map at the size passed to `NewSlidingWindowLimiter` (50 000 for OCSP/export, 100 000 for EST/Intune per-device). The at-cap eviction janitor drops the oldest entry by newest-timestamp so a key explosion (random source IPs from a botnet, random CNs from a misconfigured fleet) never bloats memory.
|
||||
- **Restart safety**. The sliding-window state is per-process in-memory. On a server restart the limiter state resets — any attacker who'd burned through their window cap before the restart gets a fresh window after. Conversely, a legitimate caller that had hit their cap right before a restart gets immediately unblocked. Both directions are intentional: we don't ship a persistent state store, so the post-restart state is "fresh start".
|
||||
|
||||
## What is NOT in scope
|
||||
|
||||
- **Shared limits across replicas**. With `server.replicas: N`, an attacker hitting all replicas in parallel gets N × the per-replica cap. For the default OCSP knob (1000 rps per IP per minute) at N=3, that's 3000 rps per IP per minute before any single replica drops the traffic. The chart's default is N=1; operators running multi-replica should multiply published per-replica caps by the replica count to get the cluster-wide effective cap.
|
||||
- **Cross-restart persistence**. See "Restart safety" above — this is by-design but operators need to know.
|
||||
- **First-party (`Authorization: Bearer ...`) request rate-limit cohesion across replicas**. The middleware-level RPS/burst rate-limit (`CERTCTL_RATE_LIMIT_RPS`) is also process-local. At N=3 replicas with default `RPS=50, Burst=100`, the effective cluster-wide rate is 150 rps with 300 burst.
|
||||
|
||||
## Operator guidance
|
||||
|
||||
**Single-replica deployments** (Helm chart default `server.replicas: 1`): published caps are the effective caps. No mental math.
|
||||
|
||||
**Multi-replica deployments**: multiply every published cap by `server.replicas` to get the effective per-key cap. If your threat model needs strict cluster-wide rate limits (e.g., SOC 2 control mapping that quotes "≤ 1000 OCSP requests per IP per minute"), one of:
|
||||
|
||||
1. **Pin to single replica** and scale vertically. The default v2.1.0 posture; works for substantial fleets.
|
||||
2. **Front the cluster with a rate-limited gateway** (NGINX `limit_req_zone`, Envoy global rate-limit service, Cloudflare WAF Bypass rules) and treat the cluster-internal limiter as defense-in-depth.
|
||||
3. **Wait for the v3 shared-rate-limit work** (tracked in WORKSPACE-ROADMAP.md). Likely Redis or Postgres-backed token-bucket plumbed through the same `internal/ratelimit` package so call sites stay unchanged.
|
||||
|
||||
## Source-of-truth references
|
||||
|
||||
- Shared primitive: `internal/ratelimit/sliding_window.go` (the package comment at the top is the canonical algorithm + concurrency reference).
|
||||
- Middleware rate limiter: `internal/api/middleware/middleware.go::RateLimitConfig`.
|
||||
- Call sites: `grep -n "ratelimit.NewSlidingWindowLimiter\|RateLimitConfig" cmd/server/main.go`.
|
||||
- Configuration knobs: `docs/reference/configuration.md` (search "rate limit").
|
||||
@@ -1,70 +0,0 @@
|
||||
# Scheduler HA Semantics
|
||||
|
||||
> Last reviewed: 2026-05-13
|
||||
|
||||
What happens when you run more than one `certctl-server` replica? Which scheduler loops are safe to run on every replica simultaneously, which need leader election, and which silently duplicate work today?
|
||||
|
||||
This page closes Bundle 4 audit findings **D8** (singleton loop ambiguity) and **HIGH-1 + MED-2** (HA semantics across the scheduler surface). It is a per-loop inventory of the 15 scheduler loops in `internal/scheduler/scheduler.go`, classified by HA-safety.
|
||||
|
||||
## TL;DR
|
||||
|
||||
The only loops that are HA-safe today via `FOR UPDATE SKIP LOCKED` job claiming are `jobProcessorLoop` and `jobRetryLoop`. Every other loop is *intra-process* idempotent (a per-replica `sync/atomic.Bool` guard prevents a single replica from running the same loop twice at once) but is *cross-replica* duplicative — two replicas tick at the same interval and both do the work.
|
||||
|
||||
For ten of the fourteen non-job-claim loops this is harmless: the work is read-only DB scanning that produces idempotent side effects (e.g., create-job-if-not-exists, send-notification-once-per-event-via-DB-ledger), and duplicate execution wastes CPU but cannot corrupt data. For four loops (`notificationProcessLoop`, `digestLoop`, `crlGenerationLoop`, `cloudDiscoveryLoop`) duplicate execution produces observable duplicate side effects: duplicate emails, duplicate webhooks, duplicate CRL writes. v2.1.0 supports `server.replicas > 1` for read availability and api throughput, but operators running multi-replica should accept these four duplication classes or pin replicas to 1 until the leader-election work lands.
|
||||
|
||||
True leader-election via Postgres advisory lock or Kubernetes lease is tracked in `WORKSPACE-ROADMAP.md` as a v3 work item.
|
||||
|
||||
## Per-loop inventory
|
||||
|
||||
The 15 loops live in `internal/scheduler/scheduler.go`. Each is a `func (s *Scheduler) <name>Loop(ctx context.Context)` driven by a `time.Ticker`. The intra-process guard pattern is `sync/atomic.Bool` `CompareAndSwap(false, true)` at the top of the loop body — pre-Bundle-4 every loop already had this guard. Bundle 4 added the cross-replica classification below.
|
||||
|
||||
| # | Loop | HA mode | Side-effect duplication risk under N>1 replicas |
|
||||
|---|---|---|---|
|
||||
| 1 | `renewalCheckLoop` | **Idempotent** — creates renewal jobs via `service.CheckExpiringCertificates`. | None. Duplicate ticks try to create the same `RenewalRequested` job; service-layer dedup (cert_id + status uniqueness window) collapses the second. Result: 2× CPU, 1× job, no data corruption. |
|
||||
| 2 | `jobProcessorLoop` | **HA-safe** — `service.ProcessPendingJobs` ultimately calls `repository/postgres.JobRepository.ClaimPendingJobs` which uses `SELECT ... FOR UPDATE SKIP LOCKED`. | None. Postgres guarantees exactly-once row claim per tick across the replica set. |
|
||||
| 3 | `jobRetryLoop` | **HA-safe** — `service.RetryFailedJobs` uses the same `ClaimPendingJobs` primitive (Bundle 1 audit fix H-6, commit `6cb4414`). | None. |
|
||||
| 4 | `jobTimeoutLoop` | **HA-safe-ish** — `service.TimeoutStalledJobs` UPDATEs with `WHERE status = 'Running' AND started_at < $cutoff` inside a single statement. Two replicas may UPDATE the same row but the second UPDATE sees `Running → Failed` already applied and matches zero rows. | None. |
|
||||
| 5 | `agentHealthCheckLoop` | **Idempotent** — UPDATEs `agents SET operational_status = 'Offline' WHERE last_heartbeat < $cutoff`. Two replicas running the same UPDATE land the same final state. | None. |
|
||||
| 6 | `notificationProcessLoop` | **Duplicates** — reads pending notification queue, dispatches to Slack / PagerDuty / SMTP / Teams / OpsGenie, marks dispatched. The dispatch and the "mark dispatched" are not in a single transaction; two replicas can both dispatch the same notification before the mark lands. | **Duplicate webhook + email sends**. Bounded — at most N duplicates for N replicas — but operator-observable. |
|
||||
| 7 | `notificationRetryLoop` | **Duplicates** — same shape as `notificationProcessLoop`. | Same as #6. |
|
||||
| 8 | `shortLivedExpiryCheckLoop` | **Idempotent** — UPDATEs cert status to `Expired` based on `expires_at < NOW()`. Two replicas land the same status. | None. |
|
||||
| 9 | `networkScanLoop` | **Idempotent** — invokes `service.NetworkScanService.ScanAllEnabledTargets` which iterates scan targets, probes each, and INSERTs discovered certs with `ON CONFLICT (fingerprint, agent_id, source_path) DO NOTHING`. | None on cert insertion. Duplicate TLS probes hit the operator's targets twice per tick. Operator may want to cap to 1 replica for low-egress-budget environments. |
|
||||
| 10 | `digestLoop` | **Duplicates** — assembles the periodic digest email and dispatches via SMTP. Two replicas at the same digest tick both send. | **Duplicate digest emails**. |
|
||||
| 11 | `healthCheckLoop` | **Idempotent** — runs the active TLS-fingerprint health-check sweep across deployed certs. Same idempotency story as #8. | None on state. Duplicate TLS probes to operator targets. |
|
||||
| 12 | `cloudDiscoveryLoop` | **Duplicates the scan; idempotent on the result store** — fetches cert lists from AWS Secrets Manager / Azure Key Vault / GCP Secret Manager, INSERTs into discovered-certs with `ON CONFLICT DO NOTHING`. | **Duplicate AWS/Azure/GCP API calls** — bills operator cloud accounts 2× per tick on the discovery API surface. Storage stays clean. |
|
||||
| 13 | `crlGenerationLoop` | **Duplicates the signing; last-writer wins on storage** — regenerates CRL DER blobs per issuer, writes to `certificate_revocation_lists` table with `UPDATE ... WHERE issuer_id = $1`. Two replicas sign two CRLs with two `thisUpdate` timestamps; the later UPDATE wins. | **Duplicate CA signing operations** (cost on HSM-backed issuers). CRL output is single-valued but the audit trail records both signings. |
|
||||
| 14 | `acmeGCLoop` | **Idempotent** — DELETEs ACME nonce / authz / order rows older than the retention window. Two replicas race the same DELETEs; second one matches zero rows. | None. |
|
||||
| 15 | `sessionGCLoop` | **Idempotent** — DELETEs expired session rows. Same shape as #14. | None. |
|
||||
|
||||
## What Bundle 4 closes
|
||||
|
||||
Bundle 4 does NOT introduce leader election. It introduces:
|
||||
|
||||
1. **Documented HA truth table** (this page) — operators know exactly which loops are safe to multi-replica and which produce operator-observable duplicates.
|
||||
2. **Migration HA** via `pg_advisory_lock` + `schema_migrations` audit table (see `internal/repository/postgres/db.go::RunMigrations`). Pre-Bundle-4 every replica race-ran the full migrations directory on boot (count via `ls migrations/*.up.sql | wc -l`). Post-Bundle-4 the first replica acquires the lock, applies migrations, populates `schema_migrations`, releases the lock. Subsequent replicas block at the lock, then observe the audit table and skip every already-applied file.
|
||||
3. **Rate-limit scope statement** at `docs/operator/rate-limit-scope.md` — process-local per-replica, restart-safe.
|
||||
|
||||
## What Bundle 4 does NOT close (deferred, tracked in WORKSPACE-ROADMAP.md)
|
||||
|
||||
- **Leader election** for `notificationProcessLoop`, `notificationRetryLoop`, `digestLoop`, `cloudDiscoveryLoop`, `crlGenerationLoop`. The cleanest implementation is a per-loop `pg_try_advisory_lock(lock_id)` at the top of `runX` so only one replica per tick claims the work, with a small leader-renewal mechanic for long-running loops. This would close the four duplicate-side-effect cases above. v3 work item.
|
||||
- **Shared rate limits across replicas**. See `docs/operator/rate-limit-scope.md`.
|
||||
|
||||
## Operator guidance
|
||||
|
||||
**Single-replica deployments (Helm `server.replicas: 1` — the chart default)**: all 15 loops work as documented. No action needed.
|
||||
|
||||
**Multi-replica deployments**: review the four duplicate-side-effect loops above against your tolerance:
|
||||
|
||||
- If your alerting fan-out can swallow duplicate webhooks (PagerDuty deduplicates by `dedup_key`, Slack does not), set `server.replicas > 1` and accept the duplication.
|
||||
- If your CRL signing uses an HSM with per-operation cost, pin to single-replica until leader election lands.
|
||||
- If you're running cloud discovery against billed AWS/Azure/GCP secret-manager APIs and you have a 6 h discovery interval, the doubling is bearable; at 30 min intervals it doubles your API spend.
|
||||
|
||||
For any duplicate-side-effect class above, the operational mitigation is pinning `server.replicas: 1` and scaling vertically. The certctl-server process is CPU-bound on issuance and IO-bound on Postgres; a single replica handles substantial fleets when given enough cores + a fast database.
|
||||
|
||||
## Source-of-truth references
|
||||
|
||||
- Scheduler loops: `internal/scheduler/scheduler.go` (15 `<name>Loop` functions, search `^func \(s \*Scheduler\) [a-zA-Z]+Loop`).
|
||||
- Job claim primitive: `internal/repository/postgres/job.go::ClaimPendingJobs` (Bundle 1 H-6 closure, commit `6cb4414`).
|
||||
- Migration HA: `internal/repository/postgres/db.go::RunMigrations` (Bundle 4 closure).
|
||||
- Rate-limit scope: `docs/operator/rate-limit-scope.md`.
|
||||
- Load-test scope: `deploy/test/loadtest/README.md` ("What it explicitly does NOT measure").
|
||||
@@ -1,54 +0,0 @@
|
||||
# Bundle 5 Security Audit Closure
|
||||
|
||||
> Last reviewed: 2026-05-13
|
||||
|
||||
Closure summary for Bundle 5 of the 2026-05-12 acquisition diligence audit — the auth / OIDC / MCP / API / browser-security edge-case pass. Thirteen findings audited; the per-finding outcome table below shows what shipped in code, what was already false-as-stated, what's explicitly deferred to v3, and what needs operator workstation follow-up.
|
||||
|
||||
## Security matrix
|
||||
|
||||
| ID | Sev | Title | Status | Where |
|
||||
|---|---|---|---|---|
|
||||
| **finding 1** | Med | Auth architecture doc conflicts with shipped OIDC/session/break-glass | **Closed (doc)** | `docs/reference/architecture.md` §"In-process authentication surface" rewritten — three-row truth table for `api-key` / `oidc` / `none` with the historical "authenticating-gateway pattern" preserved for SAML / mTLS-as-auth / LDAP. |
|
||||
| **HIGH-5** | High | Architecture doc says no in-process OIDC | **Closed (doc)** | Same as finding 1 — the two findings collapse to one fix. |
|
||||
| **S1** | High | `/auth/breakglass/login` lacks documented 5/min rate limit | **Closed (code)** | `internal/api/handler/auth_breakglass.go::AuthBreakglassHandler.loginLimiter` + `SetLoginRateLimiter` (5/min/IP, 50 000-key cap). Wired at startup in `cmd/server/main.go` (sliding-window limiter via `internal/ratelimit`). Handler returns 429 on cap-hit. Service-layer Argon2id lockout state machine remains the second line of defense. |
|
||||
| **S3** | Med | Named API keys parsed but validation requires `Secret` | **Operator decision** | `CERTCTL_API_KEYS_NAMED` is parsed into `cfg.Auth.NamedKeys` at startup. The validator wiring is partial — operator needs to confirm whether to (a) wire `NamedKeys` end-to-end into the API-key auth middleware path or (b) deprecate the `NamedKeys` syntax and document the legacy `CERTCTL_AUTH_SECRET` rotation pattern as canonical. v3 work item. |
|
||||
| **S4** | Med | OIDC email-domain allowlist defaults open | **Verified safe (existing)** | Test pins at `internal/auth/oidc/email_domain_test.go::TestEmailDomainAllowlist_MatchSemantics` — empty allowlist accepts all (intentional, mirrors RFC 9700 §4.1.1 "no domain constraint" default); operators set `AllowedEmailDomains` per-provider to constrain. `ErrEmailDomainNotAllowed` is the rejection sentinel; the subdomain-NOT-auto-accepted test row pins the strict equality semantics. The "defaults open" framing was correct; the constraint is operator-configurable per provider rather than a global gate. |
|
||||
| **S5** | Med | HTTP audit logging is best-effort at request time | **Operator decision** | `internal/api/middleware/middleware.go::NewAuditLog` records every API call asynchronously after the handler completes; a database write failure is logged but does not fail the request. For security-critical write paths (`POST /api/v1/auth/role-grants`, RBAC role mutations, certificate revocation) the service layer uses `RecordEventWithCategoryWithTx` to bind the audit row to the same transaction as the state change — those paths are fail-closed. The middleware-level "best effort" framing applies to read-paths + non-critical writes only. Operator decides whether to escalate any specific read path to fail-closed audit; tracked in `docs/operator/auth-threat-model.md`. |
|
||||
| **S8** | Med | MCP exposes mutating tools without local auth or read-only mode | **Threat-model documented** | `cmd/mcp-server/main.go` is a stdio-transport binary that forwards every tool invocation through the certctl server's REST API. Every tool call carries the operator-supplied MCP API key and is authenticated + RBAC-gated server-side identically to a CLI call. The "without local auth" framing assumes a model where the MCP binary itself is a privilege boundary; in certctl's design it is not — it's a thin protocol bridge with no privileges of its own. The threat model + an optional read-only env-var gate (which would short-circuit any tool whose name doesn't match `^list_|^get_|^describe_`) is tracked in `WORKSPACE-ROADMAP.md` as a v3 hardening item; the env var itself is not yet defined in `internal/config/config.go`. |
|
||||
| **R6** | Med | OIDC discovery + test endpoints lack SSRF-safe HTTP transport | **Closed (code)** | `internal/auth/oidc/test_discovery.go::jwksReachable` now uses an `http.Client` whose transport wraps `validation.SafeHTTPDialContext(oidcOutboundTimeout)`. Pre-Bundle-5 the probe used `http.DefaultClient` — a JWKS URI pointing at `169.254.169.254` could pivot into instance metadata. Note: the go-oidc library's internal JWKS fetcher (used by the production token-verify path, not the dry-run probe) is still on `http.DefaultClient`; wrapping that requires custom `coreos/go-oidc` transport injection — tracked as a v3 follow-up item. |
|
||||
| **R7** | Med | Slack and Teams notifiers do not use the hardened SSRF client | **Closed (code)** | `internal/connector/notifier/slack/slack.go::New` and `internal/connector/notifier/teams/teams.go::New` both build their `http.Client` with `validation.SafeHTTPDialContext`. Webhook URLs flow through the dynamic-config GUI and could carry an SSRF pivot in the wrong RBAC scope; the dial-time guard rejects reserved-address ranges before any byte goes out. Mirrors the existing `internal/connector/notifier/webhook` hardening. |
|
||||
| **SEC-H1** | High | 4 open CRIT items from 2026-05-10 auth audit block v2.1.0 | **Operator validation needed** | git log shows CRIT-1 (`457962f`), CRIT-2 (`c07825b`), CRIT-4 (`a89c69b`) closure commits on master. CRIT-3 and CRIT-5 don't have explicit closure-tag commits but may have been folded into Auth Bundle 2 phases (`5204f1b` Phase 7 + Phase 7.5 covers break-glass + OIDC-first-admin). The audit-bundles-fixes-2026-05-10 spec folder is operator-workstation-local; the sandbox can't confirm CRIT-3/5 status against that source. Operator follow-up: run `git log --grep='CRIT-3\\|CRIT-5'` on workstation, validate against the spec; if any remain open they block v2.1.0 tag (per CLAUDE.md `v2.1.0 gate`). |
|
||||
| **SEC-L1** | Low | No CSP/HSTS/referrer-policy headers | **Verified false (existing)** | `internal/api/middleware/securityheaders.go` ships HSTS / X-Frame-Options / X-Content-Type-Options / Referrer-Policy / Content-Security-Policy via the `SecurityHeaders` middleware. Wired into the chain at `cmd/server/main.go` L2003 + L2027 + L2115 (applied to every gated handler). The audit framing was stale. |
|
||||
| **SEC-L2** | Low | No 2FA/WebAuthn/step-up auth | **Documented defer** | Already tracked in `docs/operator/auth-threat-model.md` ("Threats Bundle 2 does NOT close" enumeration). WebAuthn / FIDO2 + JIT elevation are v3 work items per CLAUDE.md `v2.1.0 gate` decision 12. |
|
||||
| **RT-L2** | Low | `CERTCTL_ACME_INSECURE=true` disables TLS verification with no startup warning | **Closed (code)** | `cmd/server/main.go` now emits a prominent `logger.Warn` at boot when `cfg.ACME.Insecure == true`. Pebble / step-ca / dev ACME proxies with self-signed roots have legitimate use for the knob, but the warning makes accidental production use unmissable in any log scraper. |
|
||||
|
||||
## Verification
|
||||
|
||||
```
|
||||
gofmt -l # clean (no diffs in touched files)
|
||||
go vet ./... # clean
|
||||
go build ./cmd/server ./internal/connector/notifier/slack \
|
||||
./internal/connector/notifier/teams ./internal/auth/oidc \
|
||||
./internal/api/handler # clean
|
||||
go test -short -count=1 ./internal/connector/notifier/slack \
|
||||
./internal/connector/notifier/teams # PASS (existing notifier tests
|
||||
# still green; SSRF guard is a
|
||||
# transport wrap, contract
|
||||
# unchanged)
|
||||
```
|
||||
|
||||
## Receipts
|
||||
|
||||
- Auth surface doc rewrite: `docs/reference/architecture.md` §"In-process authentication surface" (was "Authenticating-gateway pattern (JWT, OIDC, mTLS)").
|
||||
- Break-glass rate limiter: `internal/api/handler/auth_breakglass.go::AuthBreakglassHandler.loginLimiter` + `cmd/server/main.go` wiring block.
|
||||
- ACME-insecure startup warning: `cmd/server/main.go` `cfg.ACME.Insecure` block.
|
||||
- OIDC SSRF-safe dial: `internal/auth/oidc/test_discovery.go::jwksReachable` + `oidcOutboundTimeout` constant.
|
||||
- Slack/Teams SSRF-safe dial: `internal/connector/notifier/slack/slack.go::New` + `internal/connector/notifier/teams/teams.go::New`.
|
||||
|
||||
## Source IDs closed
|
||||
|
||||
| Closed via code | Closed via doc | Verified false (existing impl) | Operator follow-up | Documented defer |
|
||||
|---|---|---|---|---|
|
||||
| S1, R6, R7, RT-L2 | finding 1, HIGH-5, S8 (threat-model framing) | S4, SEC-L1 | S3, S5, SEC-H1 | SEC-L2 |
|
||||
|
||||
Closes Bundle 5 audit pass. Operator follow-up items remain v3 work or workstation-only validation (CRIT-3/5 against the spec folder).
|
||||
Reference in New Issue
Block a user