ci(codeql): teach analyzer about ValidateSafeURL SSRF barrier

Closes CodeQL alert #23 (go/request-forgery, Critical) at the structural level — by telling CodeQL what the runtime code already does — rather than via per-line `// codeql[...]` suppressions. Background. internal/service/scep_probe.go:232 calls client.Do(req) where the request URL is built from operator-supplied input. The runtime defense is two-layer: 1. validation.ValidateSafeURL(rawURL) at scep_probe.go:86 rejects non-http(s) schemes, empty hosts, literal-IP hosts in reserved ranges (loopback, link-local incl. cloud metadata 169.254.169.254, multicast, broadcast, unspecified, IPv6 link-local), and DNS names whose A/AAAA resolution returns any reserved IP. RFC 1918 is intentionally NOT blocked — see internal/validation/ssrf.go:17-21 for the design rationale. 2. validation.SafeHTTPDialContext on the http.Transport (line 254) re-resolves at dial time, applies the same reserved-IP set, and pins the dial to a literal non-reserved IP — defeating DNS rebinding between validate and dial. CodeQL's go/request-forgery query is a syntactic taint-tracking rule with no built-in knowledge of either validator, so it reports the finding even though the runtime is correctly defended. The fix. Add a Models-as-Data (MaD) extension at .github/codeql/ declaring ValidateSafeURL as a request-forgery barrier. The barrier applies to Argument[0] (the URL parameter), which means the analyzer treats every URL flowing through ValidateSafeURL as sanitized for the request-forgery taint set. After this lands: - Alert #23 dismisses at scep_probe.go:232. - The same model applies to the second site of this exact shape — webhook notifier's outbound client.Do (internal/connector/ notifier/webhook/webhook.go) — without per-line annotations. - Future code that flows operator URLs through ValidateSafeURL inherits the barrier automatically. This is the structural fix, not a band-aid: - Band-aid (rejected): `// codeql[go/request-forgery]` suppression on line 232. Suppresses one alert; doesn't teach the analyzer. Webhook notifier would need the same comment when its sibling rule landing fires. - Structural (this change): teach CodeQL via models-as-data, in config checked into the repo, that lives next to the workflow that uses it. The validators ARE sanitizers in the runtime — this PR makes the analyzer's model match reality. Files: - .github/codeql/qlpack.yml — local model pack manifest, declares extensionTargets: codeql/go-all: '*' - .github/codeql/models/request-forgery-sanitizers.model.yml — barrierModel row for validation.ValidateSafeURL Argument[0] / request-forgery taint kind / manual provenance - .github/codeql/codeql-config.yml — references the local pack + keeps security-and-quality query suite scope - .github/workflows/codeql.yml — Initialize CodeQL step picks up config-file: ./.github/codeql/codeql-config.yml. The existing `queries: security-and-quality` line stays so even if the config file fails to load, the suite scope is preserved. - docs/architecture.md::Input Validation and SSRF Protection — extended to name the egress validators (ValidateSafeURL + SafeHTTPDialContext) and the call sites (SCEP probe + webhook notifier). Closes the docs gap surfaced during the audit; the egress threat-model previously lived only in source comments. Requires CodeQL CLI ≥ 2.25.2 for the barrierModel extensible predicate (Go MaD support added 2026-04-21). github/codeql-action@v3 ships a recent enough CLI by default; if a future analysis fails with "unknown extensible predicate barrierModel", the action's CLI has regressed below 2.25.2 — pin a newer action version rather than reverting this pack. Documented inline in qlpack.yml. References: - https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-go/ - https://github.blog/changelog/2026-04-21-codeql-now-supports-sanitizers-and-validators-in-models-as-data/
2026-07-29 15:38:55 +00:00 · 2026-05-01 00:28:26 +00:00
parent 69266c8ed2
commit d8026d5f67
5 changed files with 146 additions and 0 deletions
@@ -0,0 +1,36 @@
 # CodeQL analysis config for certctl.
 #
 # Loaded by .github/workflows/codeql.yml's `Initialize CodeQL` step via
 # `config-file:`. Two responsibilities:
 #
 #  1. Re-declare the query suite the workflow runs (security-and-quality)
 #     so that disabling the action's default suite via `disable-default-
 #     queries: true` doesn't accidentally drop coverage.
 #
 #  2. Load the local model pack at .github/codeql/, which adds project-
 #     specific Models-as-Data extensions (sanitizers, sinks, summaries)
 #     for the standard Go queries. See ./qlpack.yml for the full motivation.
 #
 # Path-ignore is intentionally empty — every path that ships with the
 # repo is in scope. Test files are NOT excluded; if a vulnerability
 # regresses in a test fixture and is later promoted to production, we
 # want CodeQL to catch it on first appearance.
 name: certctl-codeql
 # Run the same query suite the workflow has been running pre-config-file:
 # security-and-quality (security findings + maintainability/correctness).
 # Listing it here ensures the suite stays in scope even if the action's
 # default behavior shifts.
 queries:
  - uses: security-and-quality
 # Load the local model pack. This is what makes the SSRF sanitizer
 # barrier rows in models/request-forgery-sanitizers.model.yml apply to
 # the standard go/request-forgery query.
 #
 # `${{ }}` is not used here — the path is relative to the config file's
 # directory, not to the repo root, per CodeQL action docs.
 packs:
  go:
    - ./
@@ -0,0 +1,57 @@
 # Models-as-Data sanitizer rows for the go/request-forgery query.
 #
 # Each row in `data` is a 9-tuple matching the `barrierModel` extensible
 # predicate signature for Go:
 #
 #   (package, type, subtypes, name, signature, ext, output, kind, provenance)
 #
 # Where:
 #   - package    — Go import path of the sanitizer
 #   - type       — receiver type ("" for package-level functions)
 #   - subtypes   — false for non-method functions; true to apply to subtypes
 #   - name       — function/method name
 #   - signature  — empty for Go (the column is not used)
 #   - ext        — empty (reserved)
 #   - output     — access path that becomes the barrier; "Argument[N]" means
 #                  the Nth argument is sanitized after the call
 #   - kind       — taint kind the barrier applies to ("request-forgery" for
 #                  the go/request-forgery query)
 #   - provenance — origin tag for the model row ("manual" — hand-authored)
 #
 # References:
 #   - https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-go/
 #   - https://github.blog/changelog/2026-04-21-codeql-now-supports-sanitizers-and-validators-in-models-as-data/
 extensions:
  # ---------------------------------------------------------------------------
  # validation.ValidateSafeURL — primary egress URL validator.
  # ---------------------------------------------------------------------------
  # Signature: func ValidateSafeURL(rawURL string) error
  #
  # Rejects (returns non-nil error for):
  #   - non-http/https schemes (file://, gopher://, ftp://, data:, etc.)
  #   - URLs missing a host
  #   - literal-IP hosts in: 127/8 + ::1 (loopback), 169.254.0.0/16 (link-
  #     local incl. AWS/Azure/GCP cloud metadata at 169.254.169.254),
  #     224.0.0.0/4 + ff00::/8 (multicast), 255.255.255.255 (broadcast),
  #     0.0.0.0 + :: (unspecified), fe80::/10 (IPv6 link-local)
  #   - DNS names whose A/AAAA resolution returns any IP in the set above
  #
  # Source of truth: internal/validation/ssrf.go (ValidateSafeURL +
  # IsReservedIP + isReservedIPForDial). RFC 1918 (10/8, 172.16/12,
  # 192.168/16) is intentionally NOT blocked — see the comment block at
  # ssrf.go:17-21 for the design rationale.
  #
  # The companion runtime defense is SafeHTTPDialContext (installed on the
  # http.Transport via http.Transport.DialContext) which re-resolves the
  # host at dial time and pins the dial to a literal non-reserved IP,
  # defeating DNS rebinding. SafeHTTPDialContext returns a closure rather
  # than acting as a direct sanitizer in dataflow terms, so it isn't
  # modeled here — but ValidateSafeURL alone is sufficient to dismiss the
  # request-forgery alerts at the call sites that use it (scep_probe.go,
  # webhook.go).
  - addsTo:
      pack: codeql/go-all
      extensible: barrierModel
    data:
      - ["github.com/shankar0123/certctl/internal/validation", "", false, "ValidateSafeURL", "", "", "Argument[0]", "request-forgery", "manual"]
@@ -0,0 +1,42 @@
 # certctl CodeQL model pack — extends the standard Go queries with project-
 # specific data-flow knowledge (sanitizers, sinks, summaries).
 #
 # Why this exists: CodeQL's standard `go/request-forgery` query is a syntactic
 # taint-tracking rule. It traces operator-supplied URLs into HTTP egress sinks
 # (`http.Client.Do`) and reports — but it has no built-in knowledge of
 # certctl's `internal/validation.ValidateSafeURL` SSRF guard. The validator
 # IS a sanitizer (rejects loopback, link-local incl. cloud metadata
 # 169.254.169.254, multicast, broadcast, unspecified, IPv6 link-local;
 # rejects DNS names whose A/AAAA records resolve into any of those ranges)
 # but CodeQL doesn't know that, so the analyzer reports a finding the
 # runtime defense already mitigates.
 #
 # This pack uses Models-as-Data (MaD) extensions to declare the validator as
 # a barrier for the request-forgery query. After this pack is loaded:
 #   - The alert at internal/service/scep_probe.go:232 (CodeQL #23) is
 #     dismissed at source, not via per-line `// codeql[...]` suppression.
 #   - The same model applies to the second site of this shape — webhook
 #     notifier's outbound `client.Do` (internal/connector/notifier/webhook/
 #     webhook.go) — without per-line annotations.
 #   - Future code that flows operator URLs through ValidateSafeURL gets the
 #     same treatment automatically.
 #
 # Loading: codeql-config.yml's `packs:` field references this pack by its
 # `name` below. The `extensionTargets:` map declares which upstream pack the
 # extension data plugs into (codeql/go-all is the Go standard library pack).
 # The `dataExtensions:` glob matches the .model.yml files in models/.
 #
 # MaD `barrierModel` extension was added for Go in CodeQL 2.25.2 (2026-04-21).
 # `github/codeql-action@v3` (pinned in .github/workflows/codeql.yml) pulls a
 # CLI version >= 2.25.2 by default. If a future analysis fails with
 # "unknown extensible predicate barrierModel", the action's CLI version has
 # regressed below 2.25.2 — pin a newer action version rather than reverting
 # this pack.
 name: shankar0123/certctl-codeql-models
 version: 0.0.1
 library: true
 extensionTargets:
  codeql/go-all: '*'
 dataExtensions:
  - models/*.model.yml
@@ -70,6 +70,15 @@ jobs:
          # maintainability/correctness issues that the smaller security-extended
          # suite skips. Comparable scope to what Aikido / SonarCloud run.
          queries: security-and-quality
          # Custom config + model pack at .github/codeql/. The pack declares
          # internal/validation.ValidateSafeURL as a request-forgery barrier
          # via Models-as-Data, dismissing the alert that fires at every site
          # using the validator (scep_probe.go, webhook.go) without per-line
          # `// codeql[...]` suppressions. See .github/codeql/qlpack.yml for
          # the full motivation. Requires CodeQL CLI ≥ 2.25.2 for the
          # barrierModel extension; codeql-action@v3 ships a recent enough
          # CLI by default.
          config-file: ./.github/codeql/codeql-config.yml
      - name: Autobuild
        uses: github/codeql-action/autobuild@v3
@@ -993,6 +993,8 @@ Audit recording is async (via goroutine) so it never blocks the HTTP response. I
 All shell-facing inputs (connector scripts, domain names, ACME tokens) are validated through `internal/validation/command.go` before reaching shell execution. `ValidateShellCommand()` denies all shell metacharacters. `ValidateDomainName()` enforces RFC 1123. `ValidateACMEToken()` restricts to base64url characters. The network scanner filters reserved IP ranges (loopback, link-local including cloud metadata 169.254.169.254, multicast, broadcast) to prevent SSRF, while preserving RFC 1918 private ranges for legitimate internal scanning.
 **Outbound HTTP egress** is defended in two layers, both implemented in `internal/validation/ssrf.go`. Layer one is `ValidateSafeURL(rawURL)`, called at the surface of every outbound HTTP path that accepts an operator-supplied URL. It rejects non-`http(s)` schemes, empty hosts, literal-IP hosts in reserved ranges (loopback `127/8` + `::1`, IPv4 link-local `169.254/16` including AWS/Azure/GCP cloud metadata at `169.254.169.254`, IPv4 multicast `224/4`, IPv4 broadcast `255.255.255.255`, unspecified `0.0.0.0` + `::`, and IPv6 link-local `fe80::/10` + IPv6 multicast `ff00::/8`), and DNS names whose `A`/`AAAA` resolution returns any IP in that set. Layer two is `SafeHTTPDialContext(timeout)`, installed on the `http.Transport.DialContext` of every long-lived outbound HTTP client — it re-resolves the target host at dial time, applies the same reserved-IP set, and pins the TCP dial to a literal non-reserved IP so a racing DNS rebind cannot redirect the connection between validation and dial. RFC 1918 ranges (`10/8`, `172.16/12`, `192.168/16`) are intentionally NOT treated as reserved — certctl is designed to manage certificates inside private networks, and filtering them would break the primary use case. Current call sites: the SCEP probe in the network scanner (`internal/service/scep_probe.go`) and the webhook notifier (`internal/connector/notifier/webhook/`); both wire `ValidateSafeURL` at config + request entry and `SafeHTTPDialContext` on their dedicated `http.Client`. The egress validators are also declared as request-forgery barriers in the CodeQL Models-as-Data extension at `.github/codeql/models/`, so the static analyzer's view of the dataflow matches the runtime defense.
 ### Request Body Size Limits
 All incoming HTTP request bodies are capped by `http.MaxBytesReader` middleware (default 1MB, configurable via `CERTCTL_MAX_BODY_SIZE`). Requests exceeding the limit receive a 413 Request Entity Too Large response. The middleware is positioned before authentication in the chain so oversized payloads are rejected early, before any auth processing or database work occurs. Requests without bodies (GET, HEAD, nil body) skip the limit check.