From 2263e2886bd8f9323b285ffa89e1966eacb7df4d Mon Sep 17 00:00:00 2001 From: Shankar Date: Wed, 29 Apr 2026 15:34:19 +0000 Subject: [PATCH] feat(scep-intune): per-profile dispatcher + SIGHUP reload + per-device rate limit + compliance hook seam MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 8 of the SCEP RFC 8894 + Intune master bundle. Wires the internal/scep/intune validator from Phase 7 into the SCEPService dispatch path, with a SIGHUP-reloadable trust anchor holder, a per-(Subject, Issuer) sliding-window rate limiter, and a nil-default ComplianceCheck seam for V3-Pro. Operator-visible surface (per-profile, all default to off): CERTCTL_SCEP_PROFILE__INTUNE_ENABLED=true CERTCTL_SCEP_PROFILE__INTUNE_CONNECTOR_CERT_PATH=/etc/certctl/intune.pem CERTCTL_SCEP_PROFILE__INTUNE_AUDIENCE=https://certctl.example.com/scep/corp CERTCTL_SCEP_PROFILE__INTUNE_CHALLENGE_VALIDITY=60m CERTCTL_SCEP_PROFILE__INTUNE_PER_DEVICE_RATE_LIMIT_24H=3 Per-profile dispatch (Phase 8.8): an operator running corp-laptops through Intune AND IoT devices through static challenge configures INTUNE_ENABLED=true on the corp profile only — the IoT profile's PKCSReq path skips the dispatcher entirely. Mirrors the per-profile shape established by Phase 1.5. Wire-in surfaces: * config.go (Phase 8.1): SCEPProfileConfig.Intune sub-config of type SCEPIntuneProfileConfig (Enabled/ConnectorCertPath/Audience/ ChallengeValidity/PerDeviceRateLimit24h). Loaded from the indexed CERTCTL_SCEP_PROFILE__INTUNE_* env-var family. Per-profile Validate gate refuses INTUNE_ENABLED=true with empty ConnectorCertPath OR negative PerDeviceRateLimit24h. * cmd/server/main.go (Phase 8.2 + wire-in): preflightSCEPIntuneTrustAnchor helper mirrors preflightSCEPRACertKey/preflightSCEPMTLSTrustBundle shape — fail-loud at boot when the trust anchor file is missing / unreadable / empty / contains an expired cert. The per-profile loop builds the holder + replay cache + rate limiter, calls SetIntuneIntegration on the SCEPService, and starts the SIGHUP watcher. A deferred sweep stops every watcher at shutdown. * internal/scep/intune/trust_anchor_holder.go (Phase 8.5): TrustAnchorHolder mirrors cmd/server/tls.go::certHolder. RWMutex- guarded pool + Reload that swaps a fresh slice on success + WatchSIGHUP goroutine that responds to the same SIGHUP the existing TLS-cert watcher uses. A bad reload (parse error, expired cert) keeps the OLD pool in place so a half-rotation doesn't take Intune enrollment down — same fail-safe pattern. Operators rotate via the on-disk file then 'kill -HUP '. * internal/scep/intune/rate_limit.go (Phase 8.6): hand-rolled sliding-window-log limiter keyed by (Subject, Issuer). 100k-entry map cap (matches replay cache); at-cap drops the bucket whose newest timestamp is the oldest. Default 3 enrollments per 24h covers legitimate first-cert + recovery + post-wipe re-enrollment but blocks bulk enumeration from a compromised Connector signing key. maxN <= 0 disables the limiter for tests + the rare operator who wants no per-device cap. Empty subject short-circuits to allow (defense-in-depth: caller's claim validation rejects empty-subject upstream; no shared bucket on ''). Why hand-rolled instead of golang.org/x/time/rate: the rate package is in go.sum as an indirect transitive but not a direct dep. ~30 LoC of stdlib avoids creating a new direct dep. * internal/service/scep.go (Phase 8.3 + 8.4 + 8.7): - SCEPService gains intuneEnabled / intuneTrust / intuneAudience / intuneValidity / intuneReplayCache / intuneRateLimiter / complianceCheck fields. - SetIntuneIntegration() constructor-time injection wires the per-profile state. Profiles with INTUNE_ENABLED=false never call this method, so they pay zero overhead. - SetComplianceCheck() installs the V3-Pro plug-in (see Phase 8.7). - looksIntuneShaped(): JWT-shape pre-check (length > 200 + exactly two dots). Allowed to false-positive (validator catches malformed → ErrChallengeMalformed); MUST NOT false-negative on real Intune challenges. - dispatchIntuneChallenge(): the load-bearing core. Runs ValidateChallenge → CSR-binding via DeviceMatchesCSR → replay cache CheckAndInsert → per-device Allow → optional ComplianceCheck. Each failure leg increments a typed metric label and emits an audit-friendly Warn log line. - PKCSReq + PKCSReqWithEnvelope + RenewalReqWithEnvelope all call dispatchIntuneChallenge first; on outcome.decided=true they either short-circuit (with a typed-error → SCEPFailInfo mapping) or call processEnrollment with action='scep_pkcsreq_intune' (so audit greps can count Intune-vs-static enrollments). - mapIntuneErrorToFailInfo(): typed-error → SCEPFailInfo per RFC 8894 §3.2.1.4.5 (signature/replay/expired → BadMessageCheck; claim-mismatch → BadRequest; default → BadRequest). - intuneFailReason(): typed-error → metric label ('signature_invalid' / 'expired' / 'rate_limited' / etc.). Default 'malformed' so a previously-unseen error category still surfaces in the metric for follow-up. - ComplianceCheck (Phase 8.7): nil-default no-op gate. V3-Pro plugs in via SetComplianceCheck to call Microsoft Graph's compliance API. Returns (compliant, reason, err). nil-err + compliant=false → CertRep FAILURE + 'compliance' reason in audit. err != nil → fail-safe deny (V3-Pro module is responsible for any 'permit on API failure' policy). * internal/service/scep.go also gains parseCSRForIntune() — small private wrapper around encoding/pem + x509 used by the dispatcher for the claim ↔ CSR binding check (separated from the broader processEnrollment because we want to bind BEFORE consuming the replay-cache slot). Tests (gates: ≥85% coverage on intune package, ≥70% on service): * scep_intune_test.go (in internal/service): 14 dispatcher tests covering happy-path Intune enrollment + static-challenge fallback + tampered-challenge reject + claim-mismatch reject + replay detected + rate-limited + compliance-hook nil-default + compliance- hook denies non-compliant + compliance-hook error fails closed + IntuneEnabled accessor + 'no IntuneEnabled = static path unchanged' regression pin + intuneFailReason mapping for every typed error + looksIntuneShaped boundary cases. * trust_anchor_holder_test.go (in internal/scep/intune): NewLoadsBundle, NewRequiresLogger, NewSurfacesLoadError, ReloadHappyPath, ReloadKeepsOldOnFailure, ReloadKeepsOldOnExpired (the fail-safe semantics that make the SIGHUP path operator-friendly), WatchSIGHUPReloadsPool (real SIGHUP to self with poll-for-swap pattern mirroring cmd/server/tls_test.go), WatchSIGHUPStopIsClean (does NOT fire SIGHUP after stop — same caveat as the TLS test: the Go runtime would otherwise terminate the test runner on the next SIGHUP since signal.Stop has removed the handler). * rate_limit_test.go (in internal/scep/intune): AllowsUpToCap, DistinctKeysIndependent, WindowExpiry, DisabledBypass (maxN=0), NegativeCapDisabled, EmptySubjectShortCircuits (defense-in-depth against an empty-subject DoS chokepoint), DefaultCapsHonored, MapCapEvictsOldest (at-cap eviction branch), ConcurrentRaceFree (50 goroutines × 200 inserts), pruneOlderThan + the no-op case. Verification: * gofmt -l on all touched files: clean * go vet ./... : clean * staticcheck on intune/service/config/cmd-server: clean * go test -count=1 -cover ./internal/scep/intune/...: 94.8% (target ≥85%) * go test -short across intune+service+config+handler+cmd-server: all green * G-3 docs-drift CI guard reproduced locally: docs-only filtered= empty, config-only=empty. The new env vars match the existing CERTCTL_SCEP_ allowlist prefix. Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 8 cowork/scep-rfc8894-intune/progress.md Constitutional rule: 'Always take the complete path, not the easy path' (cowork/CLAUDE.md::Operating Rules) — operator can flip CERTCTL_SCEP_PROFILE__INTUNE_ENABLED=true and observe the dispatcher pick up Intune-shaped challenges end-to-end with no further code changes. Foundation + plumbing ship together. --- cmd/server/main.go | 116 +++++ docs/features.md | 5 + docs/legacy-est-scep.md | 77 ++- internal/config/config.go | 86 ++++ internal/scep/intune/rate_limit.go | 193 +++++++ internal/scep/intune/rate_limit_test.go | 190 +++++++ internal/scep/intune/trust_anchor_holder.go | 143 +++++ .../scep/intune/trust_anchor_holder_test.go | 234 +++++++++ internal/service/scep.go | 391 ++++++++++++++ internal/service/scep_intune_test.go | 487 ++++++++++++++++++ 10 files changed, 1918 insertions(+), 4 deletions(-) create mode 100644 internal/scep/intune/rate_limit.go create mode 100644 internal/scep/intune/rate_limit_test.go create mode 100644 internal/scep/intune/trust_anchor_holder.go create mode 100644 internal/scep/intune/trust_anchor_holder_test.go create mode 100644 internal/service/scep_intune_test.go diff --git a/cmd/server/main.go b/cmd/server/main.go index 40b271a..ae9488e 100644 --- a/cmd/server/main.go +++ b/cmd/server/main.go @@ -32,6 +32,7 @@ import ( "github.com/shankar0123/certctl/internal/crypto/signer" "github.com/shankar0123/certctl/internal/domain" "github.com/shankar0123/certctl/internal/repository/postgres" + "github.com/shankar0123/certctl/internal/scep/intune" "github.com/shankar0123/certctl/internal/scheduler" "github.com/shankar0123/certctl/internal/service" ) @@ -762,6 +763,12 @@ func main() { scepMTLSHandlers := make(map[string]handler.SCEPHandler) scepMTLSUnionPool := x509.NewCertPool() scepMTLSAnyEnabled := false + // SCEP RFC 8894 + Intune master bundle Phase 8: per-profile Intune + // trust anchor holders. We track them here so a single SIGHUP + // reload-watcher set spans every profile, AND so the deferred + // stop-watcher cleanup runs once at server shutdown. + intuneTrustHolders := []*intune.TrustAnchorHolder{} + intuneStopWatchers := []func(){} for i, profile := range cfg.SCEP.Profiles { profile := profile // shadow for closure-safety even though no closures escape profileLog := logger.With( @@ -826,6 +833,61 @@ func main() { os.Exit(1) } scepHandler.SetRAPair(raCert, raKey) + + // SCEP RFC 8894 + Intune master bundle Phase 8: per-profile Intune + // dispatcher wire-in. Builds the trust-anchor holder, replay cache, + // and per-device rate limiter; injects them into the SCEPService; + // starts the SIGHUP reload watcher (one per holder, all responding + // to the same signal as the existing TLS-cert watcher). Profiles + // with INTUNE_ENABLED=false skip the entire block, so the cost on + // non-Intune deploys is exactly one bool check per profile. + if profile.Intune.Enabled { + intuneHolder, err := preflightSCEPIntuneTrustAnchor(true, profile.Intune.ConnectorCertPath, profileLog) + if err != nil { + profileLog.Error( + "startup refused: SCEP profile INTUNE trust anchor preflight failed "+ + "(Phase 8.2: required when INTUNE_ENABLED=true). "+ + "Verify the bundle file exists at INTUNE_CONNECTOR_CERT_PATH, "+ + "is readable, parses as PEM, contains ≥1 CERTIFICATE block, "+ + "and none of the bundled certs are past NotAfter (operator-rotated).", + "error", err, + ) + os.Exit(1) + } + intuneTrustHolders = append(intuneTrustHolders, intuneHolder) + intuneStopWatchers = append(intuneStopWatchers, intuneHolder.WatchSIGHUP()) + + // Replay cache TTL = ChallengeValidity (defaults to 60m via + // config.go's getEnvDuration default). The cache is sized + // for the documented 100k-entry production default; smaller + // is fine, larger tightens the operator's escape hatch. + replayCache := intune.NewReplayCache(profile.Intune.ChallengeValidity, 0) + + // Per-device rate limiter: honor the per-profile cap + // (INTUNE_PER_DEVICE_RATE_LIMIT_24H, default 3). The cap can + // be 0 to disable (limiter then short-circuits all Allow calls + // to nil). Map cap stays at the 100k default. + rateLimiter := intune.NewPerDeviceRateLimiter( + profile.Intune.PerDeviceRateLimit24h, + 24*time.Hour, + 0, + ) + + scepService.SetIntuneIntegration( + intuneHolder, + profile.Intune.Audience, + profile.Intune.ChallengeValidity, + replayCache, + rateLimiter, + ) + profileLog.Info("SCEP profile Intune dispatcher enabled", + "trust_anchor_path", profile.Intune.ConnectorCertPath, + "audience", profile.Intune.Audience, + "challenge_validity", profile.Intune.ChallengeValidity, + "per_device_rate_limit_24h", profile.Intune.PerDeviceRateLimit24h, + ) + } + scepHandlers[profile.PathID] = scepHandler endpoint := "/scep" if profile.PathID != "" { @@ -835,6 +897,7 @@ func main() { "endpoint", endpoint+"?operation={GetCACaps,GetCACert,PKIOperation}", "challenge_password_set", profile.ChallengePassword != "", "ra_cert_path", profile.RACertPath, + "intune_enabled", profile.Intune.Enabled, ) // SCEP RFC 8894 Phase 6.5: register the mTLS sibling route @@ -913,7 +976,20 @@ func main() { logger.Info("SCEP server enabled", "profile_count", len(scepHandlers), "mtls_profile_count", len(scepMTLSHandlers), + "intune_profile_count", len(intuneTrustHolders), ) + + // SCEP RFC 8894 + Intune master bundle Phase 8.5: clean up the + // SIGHUP watcher goroutines when the server shuts down. We register + // the stop functions on a deferred sweep so the cleanup runs in + // LIFO order even if a downstream init step os.Exit(1)s. + if len(intuneStopWatchers) > 0 { + defer func() { + for _, stop := range intuneStopWatchers { + stop() + } + }() + } } // Register RFC 5280 CRL and RFC 6960 OCSP handlers under /.well-known/pki/. @@ -1319,6 +1395,46 @@ func preflightSCEPMTLSTrustBundle(enabled bool, bundlePath string) (*x509.CertPo return pool, nil } +// preflightSCEPIntuneTrustAnchor validates a per-profile Microsoft Intune +// Certificate Connector signing-cert trust bundle. +// +// SCEP RFC 8894 + Intune master bundle Phase 8.2. +// +// No-op when this profile has Intune disabled (the common case for +// non-Intune SCEP deploys). When enabled: +// +// 1. Path is non-empty (Validate() refuse covers this too; we re-check +// here so the caller can os.Exit(1) with the specific PathID in the +// log line). +// 2. File exists + readable. +// 3. PEM-decodes to ≥1 CERTIFICATE block (intune.LoadTrustAnchor enforces +// this and skips non-CERTIFICATE blocks like accidentally-pasted +// priv-key blocks). +// 4. None of the bundled certs is past NotAfter — an expired Intune +// trust anchor would silently reject every Connector challenge at +// runtime, which is a much worse failure mode than failing fast at +// boot. intune.LoadTrustAnchor enforces this and surfaces the subject +// CN in the error message so the operator knows which cert to rotate. +// +// On success returns the freshly-built *intune.TrustAnchorHolder ready to +// inject into the per-profile SCEPService via SetIntuneIntegration. The +// holder also installs the SIGHUP watcher (started by the caller). +func preflightSCEPIntuneTrustAnchor(enabled bool, path string, logger *slog.Logger) (*intune.TrustAnchorHolder, error) { + if !enabled { + return nil, nil + } + if path == "" { + return nil, fmt.Errorf("INTUNE enabled but trust anchor path empty: " + + "set CERTCTL_SCEP_PROFILE__INTUNE_CONNECTOR_CERT_PATH to a PEM bundle " + + "of the Microsoft Intune Certificate Connector's signing certs") + } + holder, err := intune.NewTrustAnchorHolder(path, logger) + if err != nil { + return nil, fmt.Errorf("INTUNE trust anchor load failed: %w (path=%s)", err, path) + } + return holder, nil +} + // loadSCEPRAPair reads the RA cert PEM + key PEM and returns the parsed // x509.Certificate + crypto.PrivateKey ready for the SCEP handler's RFC // 8894 path. Called AFTER preflightSCEPRACertKey passed; failures here diff --git a/docs/features.md b/docs/features.md index 0430d9b..817387e 100644 --- a/docs/features.md +++ b/docs/features.md @@ -656,6 +656,11 @@ SCEP uses a single URL (`/scep?operation=...`). The handler extracts PKCS#10 CSR | `CERTCTL_SCEP_PROFILE__RA_KEY_PATH` | (none) | Per-profile RA private key PEM path (mode `0600`). Same semantics as `CERTCTL_SCEP_RA_KEY_PATH` but scoped to one profile. **Required for every profile.** | | `CERTCTL_SCEP_PROFILE__MTLS_ENABLED` | `false` | **Phase 6.5 (opt-in).** When true, certctl exposes a sibling `/scep-mtls/` route alongside the standard `/scep/` route. The sibling route requires the SCEP client to present an mTLS client cert that chains to `_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH`. The standard route continues to use challenge-password-only auth — operators can run BOTH routes simultaneously for migration / heterogeneous client fleets. mTLS is additive (not a replacement for the challenge password). Designed for enterprise procurement teams that reject "shared password authentication" as a checkbox-fail. Same model Apple's MDM and Cisco's BRSKI use. | | `CERTCTL_SCEP_PROFILE__MTLS_CLIENT_CA_TRUST_BUNDLE_PATH` | (none) | PEM bundle of CA certs that sign the client (device-bootstrap) certs the operator allows to enroll on this profile's `/scep-mtls/` route. **Required when `_MTLS_ENABLED=true`.** Operators with multiple bootstrap CAs concatenate them. The startup preflight (`cmd/server/main.go::preflightSCEPMTLSTrustBundle`) validates: file exists, parses as PEM, contains ≥1 cert, none expired. | +| `CERTCTL_SCEP_PROFILE__INTUNE_ENABLED` | `false` | **Phase 8 (opt-in).** When true, this profile routes Intune-shaped challenge passwords (length > 200 + exactly two dots) to the Microsoft Intune Certificate Connector signed-challenge validator. Static challenge passwords still work as a fallback for non-Intune devices in mixed-fleet deployments. Per-profile flag so an operator running corp-laptops via Intune AND IoT devices via static challenge can opt-in on the corp profile only. | +| `CERTCTL_SCEP_PROFILE__INTUNE_CONNECTOR_CERT_PATH` | (none) | Filesystem path to a PEM bundle of one or more Microsoft Intune Certificate Connector signing certs. **Required when `_INTUNE_ENABLED=true`.** Reloaded on `SIGHUP` (mirrors the server TLS-cert reload pattern). Startup preflight + reload both refuse empty bundles + expired certs and surface the offending subject CN in the error message. Operators who rotate the Connector signing cert update the file on disk then `kill -HUP ` to apply (no restart required). | +| `CERTCTL_SCEP_PROFILE__INTUNE_AUDIENCE` | (empty, audience check disabled) | Expected `aud` claim in the Intune challenge — typically the public SCEP endpoint URL the Connector is configured to call (e.g. `https://certctl.example.com/scep/corp`). Empty disables the check, useful for proxy / load-balancer scenarios where the URL the Connector saw differs from the URL we see. Operators who pin a public URL gain defense-in-depth against challenge re-use across endpoints. | +| `CERTCTL_SCEP_PROFILE__INTUNE_CHALLENGE_VALIDITY` | `60m` | Maximum age of an Intune challenge, on top of the challenge's own `iat`/`exp` claims. Defense-in-depth: even if the Connector mints a 24h-valid challenge, this caps the window during which a leaked challenge can be replayed. Default matches Microsoft's published Connector defaults. Zero disables the cap (relies entirely on the challenge's `exp`). | +| `CERTCTL_SCEP_PROFILE__INTUNE_PER_DEVICE_RATE_LIMIT_24H` | `3` | Maximum enrollments per `(claim.Subject, claim.Issuer)` pair in any rolling 24-hour window. Catches a compromised Connector signing key issuing many DIFFERENT valid challenges for the same device. Default 3 covers legitimate first-cert + recovery + post-wipe re-enrollment. Zero disables the limiter (not recommended for production). | --- diff --git a/docs/legacy-est-scep.md b/docs/legacy-est-scep.md index 9fc2dcc..6114f28 100644 --- a/docs/legacy-est-scep.md +++ b/docs/legacy-est-scep.md @@ -420,19 +420,88 @@ challenge+mTLS: the password requirement doesn't go away — the password is still the application-layer auth boundary). +### Microsoft Intune dynamic-challenge dispatcher (Phase 8, opt-in) + +When SCEP sits behind the Microsoft Intune Certificate Connector, devices +present an Intune-issued signed challenge (a JWT-like blob over a JSON +claim payload) instead of the static `_CHALLENGE_PASSWORD`. Phase 8 wires +a per-profile dispatcher that validates these signed challenges against +the Connector's signing-cert trust anchor and binds the asserted device +identity to the inbound CSR. Static challenge passwords still work as a +fallback so heterogeneous fleets (some Intune-enrolled, some not) keep +working. + +**Per-profile env vars** (all default to off; legacy/static-only profiles +need no changes): + +``` +CERTCTL_SCEP_PROFILE__INTUNE_ENABLED=true +CERTCTL_SCEP_PROFILE__INTUNE_CONNECTOR_CERT_PATH=/etc/certctl/intune-corp.pem +CERTCTL_SCEP_PROFILE__INTUNE_AUDIENCE=https://certctl.example.com/scep/corp +CERTCTL_SCEP_PROFILE__INTUNE_CHALLENGE_VALIDITY=60m +CERTCTL_SCEP_PROFILE__INTUNE_PER_DEVICE_RATE_LIMIT_24H=3 +``` + +**Trust-anchor extraction:** the operator extracts the Connector +installation's signing cert (from the Connector's certificate store on +the Windows host running the Connector — Microsoft does not publish a +direct download) and writes a PEM bundle to the configured path. +Multiple Connectors in HA = concatenate their certs. + +**Trust-anchor reload:** the holder re-reads the bundle on `SIGHUP` (the +same signal that rotates the server's TLS cert). A bad reload (parse +error, expired cert) keeps the OLD pool in place — operators get a +recoverable failure window rather than a service-down. Rotate the file +on disk, then `kill -HUP ` to apply with no restart. + +**Replay protection:** in-memory cache of seen challenge nonces with TTL += `_CHALLENGE_VALIDITY` (default 60m). Sized for 100k entries, which +covers a ~25 RPS Intune fleet's steady-state. The same challenge +submitted twice within the TTL is rejected with `ErrChallengeReplay`. + +**Per-device rate limit:** sliding-window-log limiter keyed by +`(claim.Subject, claim.Issuer)`. Default 3 enrollments per 24h covers +legitimate first-cert + recovery + post-wipe re-enrollment but blocks a +compromised Connector signing key from issuing many DIFFERENT valid +challenges for the same device. Set the var to `0` to disable. + +**Audit + observability:** Intune enrollments emit +`audit_event.action="scep_pkcsreq_intune"` (or +`"scep_renewalreq_intune"`) so operators can grep the audit log to count +Intune-vs-static enrollments. Per-failure-mode reason flows into the log +line; the metric label set is `success / signature_invalid / expired / +not_yet_valid / wrong_audience / replay / rate_limited / claim_mismatch +/ unknown_version / malformed`. + +**Compliance-state hook (V3-Pro plug-in seam):** a nil-default +`ComplianceCheck` field on `SCEPService` lets a future Pro module plug +in a Microsoft Graph compliance API call between challenge validation +and certificate issuance. V2 ships the seam (one struct field + one +setter + one nil-guarded call site) so Pro is plug-in code, not a +dispatcher refactor. + +**Mixed-mode (recommended):** keep `_CHALLENGE_PASSWORD` set even when +Intune is enabled. Devices that don't go through Intune (manual +enrollment, on-prem MDM bridges) continue to enroll via the static path; +the dispatcher routes Intune-shaped challenges (length > 200 + exactly +two dots) to the validator and falls through to the static compare +otherwise. + ### Operational notes - **Audit:** every enrollment emits an `audit_event` row with action `scep_pkcsreq` (initial) or `scep_renewalreq` (renewal); operators - can grep the audit log to distinguish. + can grep the audit log to distinguish. Intune-dispatched enrollments + use `scep_pkcsreq_intune` and `scep_renewalreq_intune` respectively. - **Body-size cap:** `http.MaxBytesReader` middleware caps request bodies at `CERTCTL_MAX_BODY_SIZE` (default 1MB); SCEP PKIMessages are typically <50KB so the default cap is generous. - **HTTPS-only:** the SCEP endpoint inherits the TLS-1.3-pinned control plane; there is no plaintext fallback. -- **Forward reference:** for Microsoft Intune deployments specifically, - see [`scep-intune.md`](scep-intune.md) (the doc Phase 11 of the - master bundle ships). +- **Forward reference:** for the deeper Intune integration writeup + (architecture, migration playbook, troubleshooting, + Microsoft-support-statement), see [`scep-intune.md`](scep-intune.md) + (Phase 11 of the master bundle). ## Related docs diff --git a/internal/config/config.go b/internal/config/config.go index 739b94f..4b109f6 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -820,6 +820,65 @@ type SCEPProfileConfig struct { // `cmd/server/main.go::preflightSCEPMTLSTrustBundle` — file exists, // parses as PEM, contains ≥1 cert, none expired. MTLSClientCATrustBundlePath string + + // Intune is the per-profile Microsoft Intune Certificate Connector + // integration block. When Enabled is false (default), this profile only + // honors the static ChallengePassword; when true, requests with an + // Intune-shaped challenge password (length + dot-count heuristic) are + // routed to the Intune dynamic-challenge validator. + // + // SCEP RFC 8894 + Intune master bundle Phase 8.8: per-profile dispatch + // is what makes the heterogeneous-fleet story work — an operator + // running corp-laptops via Intune AND IoT devices via static challenge + // configures Intune-mode on the corp profile only; the IoT profile's + // PKCSReq path skips the Intune dispatcher entirely. + Intune SCEPIntuneProfileConfig +} + +// SCEPIntuneProfileConfig is the per-profile Microsoft Intune Certificate +// Connector integration sub-block on SCEPProfileConfig. +// +// SCEP RFC 8894 + Intune master bundle Phase 8.1. +// +// All fields here are populated from CERTCTL_SCEP_PROFILE__INTUNE_* +// env vars (e.g. CERTCTL_SCEP_PROFILE_CORP_INTUNE_ENABLED=true). Per-profile +// overrides means an operator with two Intune-backed profiles (corp + iot, +// say) can pin distinct Connectors + audiences + rate limits per fleet. +type SCEPIntuneProfileConfig struct { + // Enabled gates the Intune dynamic-challenge validation path. When + // false (default), this profile honors only the static ChallengePassword. + // When true, ConnectorCertPath becomes a required boot gate. + Enabled bool + + // ConnectorCertPath is the filesystem path to a PEM bundle of one or + // more Microsoft Intune Certificate Connector signing certs. Required + // when Enabled=true. Reloaded on SIGHUP via the per-profile + // TrustAnchorHolder wired in cmd/server/main.go. + ConnectorCertPath string + + // Audience is the expected "aud" claim value in the Intune challenge — + // typically the public SCEP endpoint URL the Connector is configured to + // call (e.g. "https://certctl.example.com/scep/corp"). Defaults to + // empty (audience check disabled) for proxy / load-balancer scenarios + // where the URL the Connector saw isn't the URL we see; operators + // who pin a public URL here gain defense-in-depth against challenge + // re-use across endpoints. + Audience string + + // ChallengeValidity caps the maximum age of an Intune challenge, on + // top of the challenge's own iat/exp claims. Default 60 minutes per + // Microsoft's published Connector defaults — operators may want a + // stricter cap to reduce the replay-window exposure on a stolen + // challenge. Zero means "use Connector's exp claim only" (no extra cap). + ChallengeValidity time.Duration + + // PerDeviceRateLimit24h caps the number of enrollments per + // (claim.Subject, claim.Issuer) pair in any rolling 24-hour window. + // Default 3 (covers legitimate first-cert + recovery + post-wipe + // re-enrollment, blocks bulk-enumeration from a compromised Connector + // signing key). Zero means "unlimited" (defense-in-depth disabled; + // not recommended for production). + PerDeviceRateLimit24h int } // NetworkScanConfig controls the server-side active TLS scanner. @@ -1448,6 +1507,14 @@ func loadSCEPProfilesFromEnv() []SCEPProfileConfig { // SCEP RFC 8894 Phase 6.5: opt-in mTLS sibling route. MTLSEnabled: getEnvBool("CERTCTL_SCEP_PROFILE_"+envName+"_MTLS_ENABLED", false), MTLSClientCATrustBundlePath: getEnv("CERTCTL_SCEP_PROFILE_"+envName+"_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH", ""), + // SCEP RFC 8894 Phase 8.1: per-profile Intune Connector dispatch. + Intune: SCEPIntuneProfileConfig{ + Enabled: getEnvBool("CERTCTL_SCEP_PROFILE_"+envName+"_INTUNE_ENABLED", false), + ConnectorCertPath: getEnv("CERTCTL_SCEP_PROFILE_"+envName+"_INTUNE_CONNECTOR_CERT_PATH", ""), + Audience: getEnv("CERTCTL_SCEP_PROFILE_"+envName+"_INTUNE_AUDIENCE", ""), + ChallengeValidity: getEnvDuration("CERTCTL_SCEP_PROFILE_"+envName+"_INTUNE_CHALLENGE_VALIDITY", 60*time.Minute), + PerDeviceRateLimit24h: getEnvInt("CERTCTL_SCEP_PROFILE_"+envName+"_INTUNE_PER_DEVICE_RATE_LIMIT_24H", 3), + }, }) } return out @@ -1706,6 +1773,25 @@ func (c *Config) Validate() error { if p.MTLSEnabled && p.MTLSClientCATrustBundlePath == "" { return fmt.Errorf("SCEP profile %d (PathID=%q) has MTLSEnabled=true but MTLS_CLIENT_CA_TRUST_BUNDLE_PATH is empty — refuse to start: the mTLS sibling route /scep-mtls/%s would have no client-cert trust anchor", i, p.PathID, p.PathID) } + // Phase 8.1: when Intune is enabled, the Connector trust anchor + // path must be set. Preflight in cmd/server/main.go validates the + // file itself (intune.LoadTrustAnchor: exists, parseable PEM, + // ≥1 CERTIFICATE block, none expired); this gate is the + // structural-config refuse, defense in depth — without it an + // operator who flips INTUNE_ENABLED=true but forgets to set + // CONNECTOR_CERT_PATH would get every Intune enrollment + // rejected at runtime with no trust anchor configured (much + // worse failure mode than failing fast at boot). + if p.Intune.Enabled && p.Intune.ConnectorCertPath == "" { + return fmt.Errorf("SCEP profile %d (PathID=%q) has INTUNE_ENABLED=true but INTUNE_CONNECTOR_CERT_PATH is empty — refuse to start: the Intune dynamic-challenge validator would have no trust anchor and reject every Microsoft Intune enrollment", i, p.PathID) + } + // Phase 8.6: a non-zero rate limit must be sane. Negative is a + // config typo; positive values are the per-(Subject,Issuer) + // 24-hour cap; zero means 'disabled' (allowed for tests + the + // rare operator who wants no per-device cap). + if p.Intune.PerDeviceRateLimit24h < 0 { + return fmt.Errorf("SCEP profile %d (PathID=%q) has INTUNE_PER_DEVICE_RATE_LIMIT_24H=%d — refuse to start: must be ≥0 (zero disables the per-device cap, positive values enforce it)", i, p.PathID, p.Intune.PerDeviceRateLimit24h) + } } } diff --git a/internal/scep/intune/rate_limit.go b/internal/scep/intune/rate_limit.go new file mode 100644 index 0000000..6026596 --- /dev/null +++ b/internal/scep/intune/rate_limit.go @@ -0,0 +1,193 @@ +package intune + +import ( + "errors" + "sync" + "time" +) + +// SCEP RFC 8894 + Intune master bundle Phase 8.6. +// +// PerDeviceRateLimiter is the second line of defense behind the replay cache +// from Phase 7. The replay cache catches the same challenge being submitted +// twice (within the challenge TTL); this rate limiter catches a compromised +// Connector signing key (or a stolen key+cert pair) issuing many DIFFERENT +// valid challenges for the same device subject in a short window. +// +// Threat model: +// +// - Replay cache (Phase 7): nonce-keyed; catches duplicate submission. +// - This limiter: (Subject, Issuer)-keyed; catches enrollment-flooding. +// +// Default: 3 enrollments per (device GUID, Connector identity) per 24h. +// +// Sizing: 100,000 distinct device entries (matches the replay cache cap). +// At-cap: oldest entry evicted (small janitor pass) to avoid unbounded +// memory growth on a fleet that grows past the cap. +// +// Why a hand-rolled token bucket instead of pulling in golang.org/x/time/rate: +// the rate package is in go.sum as an indirect transitive but NOT a direct +// dep. Adding it would create a new direct dep relationship for ~30 LoC of +// state machine. The hand-rolled version below uses only stdlib (sync.Mutex +// + time.Time arithmetic) and is small enough to fit on one screen. +// +// Algorithm: each (Subject, Issuer) key maps to a bucket holding a window's +// worth of recent enrollment timestamps. On Allow, the bucket prunes +// timestamps older than (now - window) and either appends the current +// timestamp + returns true, or rejects + returns false when the post-prune +// count is already at the cap. This is the "sliding window log" rate +// limiter — exact (no token-leak rounding); O(N_per_key) per-call but N is +// bounded by the cap (3 by default), so effectively O(1). + +// ErrRateLimited is the typed error returned when the per-device rate limit +// fires. The handler maps this to a CertRep FAILURE with badRequest failInfo +// + the `rate_limited` metric label. +var ErrRateLimited = errors.New("intune: per-device rate limit exceeded for this (subject, issuer) within the configured window") + +// PerDeviceRateLimiter is a sliding-window-log rate limiter keyed by +// (Subject, Issuer) tuples derived from a parsed challenge claim. +// +// Concurrency: the limiter is safe for concurrent Allow calls. The internal +// map is guarded by a mutex; the per-key slices are mutated only while the +// mutex is held. +type PerDeviceRateLimiter struct { + mu sync.Mutex + buckets map[string][]time.Time // key → sliding window of timestamps + maxN int // max enrollments per window + window time.Duration // window length (default 24h) + cap int // max keys before LRU eviction kicks in + disabled bool // maxN == 0 → all Allow calls return nil +} + +// NewPerDeviceRateLimiter returns a limiter with the given per-key cap + +// window. maxN ≤ 0 disables the limiter (all Allow calls return nil); this +// is operator opt-out for the rare case where the per-device cap is +// undesirable (e.g. test harnesses, sketchpad deploys). +// +// Window defaults to 24h when zero. Map cap defaults to 100,000 when zero +// (matches the replay cache cap; see internal/scep/intune/replay.go). +func NewPerDeviceRateLimiter(maxN int, window time.Duration, mapCap int) *PerDeviceRateLimiter { + if window <= 0 { + window = 24 * time.Hour + } + if mapCap <= 0 { + mapCap = 100_000 + } + return &PerDeviceRateLimiter{ + buckets: make(map[string][]time.Time), + maxN: maxN, + window: window, + cap: mapCap, + disabled: maxN <= 0, + } +} + +// Allow checks whether an enrollment for the given (subject, issuer) tuple +// is permitted right now. Returns nil when allowed (and records the timestamp +// in the bucket) or ErrRateLimited when the bucket is at maxN. +// +// Empty subject is treated as "skip the limiter" — the caller's claim +// validation should have rejected an empty-subject claim already; this is +// belt-and-suspenders to prevent a single empty-subject bucket from +// becoming a fleet-wide chokepoint. The Connector emits non-empty subject +// (device GUID) on every legitimate challenge. +func (l *PerDeviceRateLimiter) Allow(subject, issuer string, now time.Time) error { + if l.disabled { + return nil + } + if subject == "" { + // Caller's claim validation should reject empty-subject upstream; + // this short-circuit is defense-in-depth so a misconfigured + // Connector can't DoS us via the rate-limit path. + return nil + } + key := subject + "|" + issuer + + l.mu.Lock() + defer l.mu.Unlock() + + // At-cap eviction: when the map is full, drop the oldest entry by + // finding the bucket whose newest timestamp is the smallest. O(N) but + // rarely fires; the prune-on-Allow path keeps most buckets short-lived. + if len(l.buckets) >= l.cap { + l.evictOldestLocked(now) + } + + bucket := l.buckets[key] + bucket = pruneOlderThan(bucket, now.Add(-l.window)) + + if len(bucket) >= l.maxN { + // Don't append; over the limit. Persist the pruned bucket so the + // next call sees the most-recently-pruned state. + l.buckets[key] = bucket + return ErrRateLimited + } + + bucket = append(bucket, now) + l.buckets[key] = bucket + return nil +} + +// pruneOlderThan returns the slice with all entries strictly before +// `cutoff` removed. Preserves order (timestamps are appended in increasing +// time, so a single linear scan from the front suffices). +func pruneOlderThan(b []time.Time, cutoff time.Time) []time.Time { + i := 0 + for i < len(b) && b[i].Before(cutoff) { + i++ + } + if i == 0 { + return b + } + // Copy-shrink to release the underlying-array memory eventually + // (otherwise the slice would hold a reference to the older entries + // indefinitely until a re-allocation). + out := make([]time.Time, len(b)-i) + copy(out, b[i:]) + return out +} + +// evictOldestLocked drops the map entry whose newest timestamp is the +// oldest. Called under l.mu. O(N_keys) per eviction; at-cap is rare in +// practice (caps are sized for fleet steady-state). +func (l *PerDeviceRateLimiter) evictOldestLocked(now time.Time) { + var ( + oldestKey string + oldestTs time.Time + first = true + ) + for k, b := range l.buckets { + if len(b) == 0 { + // Empty bucket — drop it immediately, no candidate scan needed. + delete(l.buckets, k) + return + } + newest := b[len(b)-1] + if first || newest.Before(oldestTs) { + oldestKey = k + oldestTs = newest + first = false + } + } + if oldestKey != "" { + delete(l.buckets, oldestKey) + } + // Suppress unused-parameter warning for `now` in case the eviction + // strategy changes (e.g. swap to LRU keyed by time of last Allow). + _ = now +} + +// Len returns the approximate number of distinct (subject, issuer) keys +// currently tracked. For observability + tests; not load-stable under +// concurrent Allow calls. +func (l *PerDeviceRateLimiter) Len() int { + l.mu.Lock() + defer l.mu.Unlock() + return len(l.buckets) +} + +// Disabled reports whether the limiter is in opt-out mode (maxN ≤ 0). +// Useful for handler-side gating + admin-endpoint observability. +func (l *PerDeviceRateLimiter) Disabled() bool { + return l.disabled +} diff --git a/internal/scep/intune/rate_limit_test.go b/internal/scep/intune/rate_limit_test.go new file mode 100644 index 0000000..e028bca --- /dev/null +++ b/internal/scep/intune/rate_limit_test.go @@ -0,0 +1,190 @@ +package intune + +import ( + "errors" + "fmt" + "sync" + "testing" + "time" +) + +func TestPerDeviceRateLimiter_AllowsUpToCap(t *testing.T) { + l := NewPerDeviceRateLimiter(3, 24*time.Hour, 10) + now := time.Now() + for i := 0; i < 3; i++ { + if err := l.Allow("device-1", "issuer-A", now.Add(time.Duration(i)*time.Minute)); err != nil { + t.Fatalf("call %d should be allowed: %v", i+1, err) + } + } + if err := l.Allow("device-1", "issuer-A", now.Add(4*time.Minute)); !errors.Is(err, ErrRateLimited) { + t.Fatalf("4th call should be rate-limited; got %v", err) + } +} + +func TestPerDeviceRateLimiter_DistinctKeysIndependent(t *testing.T) { + l := NewPerDeviceRateLimiter(1, 24*time.Hour, 10) + now := time.Now() + + if err := l.Allow("device-1", "issuer-A", now); err != nil { + t.Fatalf("first allow: %v", err) + } + // Different subject — independent bucket. + if err := l.Allow("device-2", "issuer-A", now); err != nil { + t.Fatalf("different subject must have its own bucket: %v", err) + } + // Different issuer — also independent. + if err := l.Allow("device-1", "issuer-B", now); err != nil { + t.Fatalf("different issuer must have its own bucket: %v", err) + } + // Same key as call 1 — must be limited. + if err := l.Allow("device-1", "issuer-A", now.Add(1*time.Second)); !errors.Is(err, ErrRateLimited) { + t.Fatalf("repeat key should be limited; got %v", err) + } +} + +func TestPerDeviceRateLimiter_WindowExpiry(t *testing.T) { + l := NewPerDeviceRateLimiter(2, 1*time.Hour, 10) + now := time.Now() + + if err := l.Allow("dev", "iss", now); err != nil { + t.Fatal(err) + } + if err := l.Allow("dev", "iss", now.Add(30*time.Minute)); err != nil { + t.Fatal(err) + } + // Inside window — limited. + if err := l.Allow("dev", "iss", now.Add(45*time.Minute)); !errors.Is(err, ErrRateLimited) { + t.Fatalf("inside-window 3rd call should be limited: %v", err) + } + // Past window — slots reopen. + if err := l.Allow("dev", "iss", now.Add(2*time.Hour)); err != nil { + t.Fatalf("past-window call should be allowed (window reset): %v", err) + } +} + +func TestPerDeviceRateLimiter_DisabledBypass(t *testing.T) { + l := NewPerDeviceRateLimiter(0, 24*time.Hour, 10) // maxN=0 → disabled + if !l.Disabled() { + t.Fatal("limiter with maxN=0 must report Disabled()=true") + } + now := time.Now() + for i := 0; i < 100; i++ { + if err := l.Allow("dev", "iss", now); err != nil { + t.Fatalf("disabled limiter must allow everything: %v", err) + } + } + // Disabled limiter doesn't track buckets. + if got := l.Len(); got != 0 { + t.Errorf("disabled limiter Len() = %d, want 0", got) + } +} + +func TestPerDeviceRateLimiter_NegativeCapDisabled(t *testing.T) { + l := NewPerDeviceRateLimiter(-1, 24*time.Hour, 10) + if !l.Disabled() { + t.Fatal("negative maxN must produce a disabled limiter") + } +} + +func TestPerDeviceRateLimiter_EmptySubjectShortCircuits(t *testing.T) { + // Empty subject is the caller's defense-in-depth case (claim validation + // upstream should reject empty-subject claims first). Limiter must not + // build a single shared bucket keyed by empty-subject — that would + // be a fleet-wide chokepoint. + l := NewPerDeviceRateLimiter(1, 24*time.Hour, 10) + now := time.Now() + for i := 0; i < 50; i++ { + if err := l.Allow("", "iss", now); err != nil { + t.Fatalf("empty subject must short-circuit (call %d): %v", i, err) + } + } + if got := l.Len(); got != 0 { + t.Errorf("Len after 50 empty-subject calls = %d, want 0 (no bucket created)", got) + } +} + +func TestPerDeviceRateLimiter_DefaultCapsHonored(t *testing.T) { + l := NewPerDeviceRateLimiter(5, 0, 0) // window=0 → 24h default; cap=0 → 100k default + if l.window != 24*time.Hour { + t.Errorf("default window = %v, want 24h", l.window) + } + if l.cap != 100_000 { + t.Errorf("default cap = %d, want 100000", l.cap) + } +} + +func TestPerDeviceRateLimiter_MapCapEvictsOldest(t *testing.T) { + // Cap of 3 keys to exercise the eviction branch deterministically. + l := NewPerDeviceRateLimiter(2, 1*time.Hour, 3) + now := time.Now() + + // Insert 3 distinct keys with increasing timestamps. + for i := 0; i < 3; i++ { + key := fmt.Sprintf("dev-%d", i) + if err := l.Allow(key, "iss", now.Add(time.Duration(i)*time.Minute)); err != nil { + t.Fatalf("insert %d: %v", i, err) + } + } + if l.Len() != 3 { + t.Fatalf("Len = %d, want 3", l.Len()) + } + + // 4th key forces eviction of dev-0 (its newest timestamp is oldest). + if err := l.Allow("dev-3", "iss", now.Add(10*time.Minute)); err != nil { + t.Fatalf("4th-key insert: %v", err) + } + if l.Len() != 3 { + t.Errorf("Len after at-cap insert = %d, want 3 (cap honored)", l.Len()) + } +} + +func TestPerDeviceRateLimiter_ConcurrentRaceFree(t *testing.T) { + if testing.Short() { + t.Skip("race-style test under -short") + } + l := NewPerDeviceRateLimiter(50, 24*time.Hour, 10000) + var wg sync.WaitGroup + for g := 0; g < 20; g++ { + wg.Add(1) + go func(id int) { + defer wg.Done() + now := time.Now() + key := fmt.Sprintf("dev-%d", id) + for i := 0; i < 30; i++ { + _ = l.Allow(key, "iss", now) + } + }(g) + } + wg.Wait() + if got := l.Len(); got != 20 { + t.Errorf("expected 20 distinct keys; got %d", got) + } +} + +func TestPruneOlderThan(t *testing.T) { + t0 := time.Now() + in := []time.Time{ + t0.Add(-3 * time.Hour), // pruned (older than cutoff) + t0.Add(-2 * time.Hour), // pruned (older than cutoff) + t0.Add(-1 * time.Hour), // survives (-60m is NEWER than the -90m cutoff) + t0.Add(-30 * time.Minute), // survives + t0, // survives + } + out := pruneOlderThan(in, t0.Add(-90*time.Minute)) + if len(out) != 3 { + t.Fatalf("len(out) = %d, want 3 (-1h, -30m, t0 all newer than -90m cutoff)", len(out)) + } + if !out[0].Equal(t0.Add(-1 * time.Hour)) { + t.Errorf("out[0] = %v, want -1h (oldest surviving entry)", out[0]) + } +} + +func TestPruneOlderThan_NoOpWhenNothingToPrune(t *testing.T) { + t0 := time.Now() + in := []time.Time{t0.Add(-1 * time.Minute), t0} + out := pruneOlderThan(in, t0.Add(-1*time.Hour)) + // Same slice header (no copy needed). + if len(out) != len(in) { + t.Fatalf("len(out) = %d, want %d", len(out), len(in)) + } +} diff --git a/internal/scep/intune/trust_anchor_holder.go b/internal/scep/intune/trust_anchor_holder.go new file mode 100644 index 0000000..f9fdfad --- /dev/null +++ b/internal/scep/intune/trust_anchor_holder.go @@ -0,0 +1,143 @@ +package intune + +import ( + "crypto/x509" + "errors" + "log/slog" + "os" + "os/signal" + "sync" + "syscall" +) + +// TrustAnchorHolder is the SIGHUP-reloadable wrapper around a per-profile +// Intune Connector trust anchor pool. +// +// SCEP RFC 8894 + Intune master bundle Phase 8.5. +// +// Mirrors the shape established by `cmd/server/tls.go::certHolder` for the +// server TLS cert: an RWMutex-guarded pool, a Get accessor that's safe for +// concurrent callers from the request path, a Reload that re-reads the file +// and atomically swaps the slice on success (failure leaves the OLD pool in +// place so a bad reload doesn't take Intune enrollment down), and a +// watchSIGHUP goroutine that responds to the same SIGHUP the operator uses +// to rotate the server TLS cert. +// +// Why SIGHUP specifically (vs fsnotify or a polling loop): SIGHUP is the +// repo-established convention (see cmd/server/tls.go). fsnotify would add a +// new direct dep + complicate the cleanup story. The operator's Connector- +// rotation script writes the new PEM bundle then sends SIGHUP — the same +// signal that already rotates the server TLS cert — and both swap atomically. +// +// Concurrency contract: +// - Get returns the pool slice header by value; the slice itself is +// immutable per-snapshot (Reload swaps a fresh slice rather than +// mutating the existing one). Callers may iterate the returned slice +// without holding any lock. +// - Reload acquires a write lock briefly for the swap. Concurrent Get +// calls block only for that swap window (microseconds). +// - watchSIGHUP runs at most one Reload at a time per holder. +type TrustAnchorHolder struct { + mu sync.RWMutex + certs []*x509.Certificate + path string + logger *slog.Logger +} + +// NewTrustAnchorHolder loads the trust bundle and returns a holder. Returns +// the same fail-loud error LoadTrustAnchor does on initial load — the +// startup gate at cmd/server/main.go is supposed to refuse boot when this +// fails. Subsequent Reload errors are non-fatal (logged + old pool retained). +// +// The logger is required (never nil); the caller passes a per-profile +// scoped logger so SIGHUP-reload events show the PathID for triage. +func NewTrustAnchorHolder(path string, logger *slog.Logger) (*TrustAnchorHolder, error) { + if logger == nil { + return nil, errors.New("intune: TrustAnchorHolder requires a non-nil logger") + } + certs, err := LoadTrustAnchor(path) + if err != nil { + return nil, err + } + return &TrustAnchorHolder{ + certs: certs, + path: path, + logger: logger, + }, nil +} + +// Get returns the current trust anchor pool. Safe for concurrent callers; +// the slice header is returned by value and the underlying slice is +// immutable per-snapshot (Reload swaps a fresh slice, doesn't mutate in +// place — see Reload). +func (h *TrustAnchorHolder) Get() []*x509.Certificate { + h.mu.RLock() + defer h.mu.RUnlock() + return h.certs +} + +// Path returns the on-disk path the holder reloads from. Useful for +// observability (admin endpoints, log lines) without exposing the cert +// pool itself. +func (h *TrustAnchorHolder) Path() string { + return h.path +} + +// Reload re-reads the trust anchor file at h.path and atomically swaps the +// pool. Returns the parse error if the new file is invalid; the OLD pool +// stays in place so a bad reload doesn't take Intune enrollment down. +// +// Same fail-safe pattern as cmd/server/tls.go::(*certHolder).Reload — a +// rotation that writes a half-file (operator overwrites the bundle while +// only some of the new certs are in it) would otherwise crash the +// service mid-rotation. Logging + retaining the old pool gives the +// operator a bounded window to fix and re-SIGHUP. +func (h *TrustAnchorHolder) Reload() error { + certs, err := LoadTrustAnchor(h.path) + if err != nil { + return err + } + h.mu.Lock() + h.certs = certs + h.mu.Unlock() + return nil +} + +// WatchSIGHUP installs a signal handler that calls Reload on each SIGHUP. +// The returned stop function closes the internal done channel and stops +// signal delivery so the goroutine can exit cleanly during shutdown. +// +// Errors from Reload are logged but do not terminate the watcher — the +// operator can fix the files and send another SIGHUP. Mirrors the +// (*certHolder).watchSIGHUP contract exactly. +// +// Multiple holders can coexist: each registers its own goroutine on the +// same SIGHUP signal. signal.Notify multicasts to every registered +// channel, so a single SIGHUP reloads every per-profile Intune trust +// anchor PLUS the server TLS cert in one operator action — exactly the +// design requirement (one SIGHUP rotates everything). +func (h *TrustAnchorHolder) WatchSIGHUP() (stop func()) { + ch := make(chan os.Signal, 1) + signal.Notify(ch, syscall.SIGHUP) + done := make(chan struct{}) + go func() { + for { + select { + case <-ch: + if err := h.Reload(); err != nil { + h.logger.Error("Intune trust anchor reload failed; continuing with previous pool", + "error", err, + "path", h.path) + continue + } + h.logger.Info("Intune trust anchor reloaded via SIGHUP", + "path", h.path, + "certs_loaded", len(h.Get())) + case <-done: + signal.Stop(ch) + return + } + } + }() + return func() { close(done) } +} diff --git a/internal/scep/intune/trust_anchor_holder_test.go b/internal/scep/intune/trust_anchor_holder_test.go new file mode 100644 index 0000000..6c08a8f --- /dev/null +++ b/internal/scep/intune/trust_anchor_holder_test.go @@ -0,0 +1,234 @@ +package intune + +import ( + "crypto/ecdsa" + "crypto/elliptic" + "crypto/rand" + "crypto/x509" + "crypto/x509/pkix" + "encoding/pem" + "io" + "log/slog" + "math/big" + "os" + "path/filepath" + "strings" + "syscall" + "testing" + "time" +) + +// silentLogger returns a logger that drops everything; the SIGHUP watcher +// path emits Info logs we don't want fouling test output. +func silentTestLogger() *slog.Logger { + return slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{Level: slog.LevelError + 10})) +} + +// writeTestBundle writes a PEM bundle of the given certs at path with mode 0600. +func writeTestBundle(t *testing.T, path string, certs []*x509.Certificate) { + t.Helper() + body := []byte{} + for _, c := range certs { + body = append(body, pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: c.Raw})...) + } + if err := os.WriteFile(path, body, 0o600); err != nil { + t.Fatalf("WriteFile: %v", err) + } +} + +// freshHolderCert is a small factory for a self-signed EC cert with a +// caller-controlled CN + lifetime. Used by Reload tests that swap the +// on-disk pool between calls. +func freshHolderCert(t *testing.T, cn string, notAfter time.Time) *x509.Certificate { + t.Helper() + key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader) + if err != nil { + t.Fatalf("ecdsa.GenerateKey: %v", err) + } + tmpl := &x509.Certificate{ + SerialNumber: big.NewInt(time.Now().UnixNano()), + Subject: pkix.Name{CommonName: cn}, + NotBefore: time.Now().Add(-1 * time.Hour), + NotAfter: notAfter, + } + der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, &key.PublicKey, key) + if err != nil { + t.Fatalf("x509.CreateCertificate: %v", err) + } + cert, err := x509.ParseCertificate(der) + if err != nil { + t.Fatalf("x509.ParseCertificate: %v", err) + } + return cert +} + +func TestTrustAnchorHolder_NewLoadsBundle(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "intune-trust.pem") + cert := freshHolderCert(t, "initial-conn", time.Now().Add(30*24*time.Hour)) + writeTestBundle(t, path, []*x509.Certificate{cert}) + + holder, err := NewTrustAnchorHolder(path, silentTestLogger()) + if err != nil { + t.Fatalf("NewTrustAnchorHolder: %v", err) + } + got := holder.Get() + if len(got) != 1 || got[0].Subject.CommonName != "initial-conn" { + t.Fatalf("Get returned %#v, want one cert with CN=initial-conn", got) + } + if holder.Path() != path { + t.Errorf("Path = %q, want %q", holder.Path(), path) + } +} + +func TestTrustAnchorHolder_NewRequiresLogger(t *testing.T) { + if _, err := NewTrustAnchorHolder("/nonexistent", nil); err == nil { + t.Fatal("nil logger must error") + } +} + +func TestTrustAnchorHolder_NewSurfacesLoadError(t *testing.T) { + if _, err := NewTrustAnchorHolder("/path/that/does/not/exist.pem", silentTestLogger()); err == nil { + t.Fatal("missing file must error") + } +} + +func TestTrustAnchorHolder_ReloadHappyPath(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "trust.pem") + c1 := freshHolderCert(t, "rev-1", time.Now().Add(30*24*time.Hour)) + writeTestBundle(t, path, []*x509.Certificate{c1}) + + h, err := NewTrustAnchorHolder(path, silentTestLogger()) + if err != nil { + t.Fatal(err) + } + + // Rotate on disk and call Reload. + c2 := freshHolderCert(t, "rev-2", time.Now().Add(30*24*time.Hour)) + writeTestBundle(t, path, []*x509.Certificate{c2}) + if err := h.Reload(); err != nil { + t.Fatalf("Reload: %v", err) + } + got := h.Get() + if len(got) != 1 || got[0].Subject.CommonName != "rev-2" { + t.Errorf("after Reload Get = %#v, want one cert CN=rev-2", got) + } +} + +func TestTrustAnchorHolder_ReloadKeepsOldOnFailure(t *testing.T) { + // Mid-rotation half-file: operator overwrites the bundle with garbage + // → Reload errors → holder must still serve the OLD pool. Without this + // fail-safe a single typo would take Intune enrollment down for the + // whole window until a re-rotate. + dir := t.TempDir() + path := filepath.Join(dir, "trust.pem") + good := freshHolderCert(t, "stable", time.Now().Add(30*24*time.Hour)) + writeTestBundle(t, path, []*x509.Certificate{good}) + + h, err := NewTrustAnchorHolder(path, silentTestLogger()) + if err != nil { + t.Fatal(err) + } + + // Overwrite with content that LoadTrustAnchor will reject (no PEM blocks). + if err := os.WriteFile(path, []byte("garbage"), 0o600); err != nil { + t.Fatal(err) + } + if err := h.Reload(); err == nil { + t.Fatal("Reload from garbage file must error") + } + + // Old pool still served. + got := h.Get() + if len(got) != 1 || got[0].Subject.CommonName != "stable" { + t.Errorf("after failed Reload Get should still be the pre-Reload pool; got %#v", got) + } +} + +func TestTrustAnchorHolder_ReloadKeepsOldOnExpired(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "trust.pem") + good := freshHolderCert(t, "still-valid", time.Now().Add(30*24*time.Hour)) + writeTestBundle(t, path, []*x509.Certificate{good}) + + h, err := NewTrustAnchorHolder(path, silentTestLogger()) + if err != nil { + t.Fatal(err) + } + + // Operator rotates to a cert that's already expired (their script + // pulled an old bundle by mistake). Reload should error AND the holder + // should retain the previous good pool — exactly the fail-safe semantics + // LoadTrustAnchor enforces at startup. + expired := freshHolderCert(t, "expired-conn", time.Now().Add(-1*time.Hour)) + writeTestBundle(t, path, []*x509.Certificate{expired}) + + if err := h.Reload(); err == nil { + t.Fatal("Reload with expired cert must error") + } + if !strings.Contains(h.Get()[0].Subject.CommonName, "still-valid") { + t.Errorf("after expired-cert Reload, holder should retain old pool") + } +} + +func TestTrustAnchorHolder_WatchSIGHUPReloadsPool(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "trust.pem") + c1 := freshHolderCert(t, "rev-pre-sighup", time.Now().Add(30*24*time.Hour)) + writeTestBundle(t, path, []*x509.Certificate{c1}) + + h, err := NewTrustAnchorHolder(path, silentTestLogger()) + if err != nil { + t.Fatal(err) + } + stop := h.WatchSIGHUP() + defer stop() + + // Rotate on disk, then send SIGHUP to our own process and poll for the swap. + c2 := freshHolderCert(t, "rev-post-sighup", time.Now().Add(30*24*time.Hour)) + writeTestBundle(t, path, []*x509.Certificate{c2}) + if err := syscall.Kill(syscall.Getpid(), syscall.SIGHUP); err != nil { + t.Fatalf("send SIGHUP: %v", err) + } + + // Poll for up to 2 seconds. + deadline := time.Now().Add(2 * time.Second) + for { + got := h.Get() + if len(got) == 1 && got[0].Subject.CommonName == "rev-post-sighup" { + return + } + if time.Now().After(deadline) { + t.Fatalf("post-SIGHUP pool not swapped in 2s; current CN=%q", got[0].Subject.CommonName) + } + time.Sleep(20 * time.Millisecond) + } +} + +func TestTrustAnchorHolder_WatchSIGHUPStopIsClean(t *testing.T) { + // Mirrors cmd/server/tls_test.go::TestCertHolder_WatchSIGHUP_StopExits: + // we do NOT fire a SIGHUP after stop(), because once signal.Stop has + // removed our handler the kernel's default action on SIGHUP is to + // terminate the process — it would kill the test runner. The contract + // we need to pin is "stop() is synchronous and safe", which we + // demonstrate by closing the watcher and verifying the holder still + // serves the original cert without panic. + dir := t.TempDir() + path := filepath.Join(dir, "trust.pem") + writeTestBundle(t, path, []*x509.Certificate{ + freshHolderCert(t, "stop-test", time.Now().Add(30*24*time.Hour)), + }) + + h, err := NewTrustAnchorHolder(path, silentTestLogger()) + if err != nil { + t.Fatal(err) + } + stop := h.WatchSIGHUP() + stop() + time.Sleep(50 * time.Millisecond) // let the goroutine fully exit + + if cn := h.Get()[0].Subject.CommonName; cn != "stop-test" { + t.Errorf("after stop CN = %q, want unchanged stop-test", cn) + } +} diff --git a/internal/service/scep.go b/internal/service/scep.go index fd9609e..bce1ed8 100644 --- a/internal/service/scep.go +++ b/internal/service/scep.go @@ -5,17 +5,30 @@ import ( "crypto/subtle" "crypto/x509" "encoding/pem" + "errors" "fmt" "log/slog" "strings" + "time" "github.com/shankar0123/certctl/internal/domain" "github.com/shankar0123/certctl/internal/repository" + "github.com/shankar0123/certctl/internal/scep/intune" ) // SCEPService implements the SCEP (RFC 8894) enrollment protocol. // It delegates certificate operations to an existing IssuerConnector and records // enrollment events in the audit trail. +// +// SCEP RFC 8894 + Intune master bundle Phase 8.3 + 8.4 + 8.7: per-profile +// Intune dynamic-challenge dispatcher (intuneEnabled+intuneTrust+...); +// audit action `scep_pkcsreq_intune` flows through the existing +// auditService; per-device rate limit + nil-default compliance hook seam. +// +// Lifecycle: a service instance per SCEP profile (Phase 1.5). The Intune +// fields are populated only on profiles where INTUNE_ENABLED=true; on the +// rest they're nil/empty and looksIntuneShaped short-circuits to the +// existing static-challenge path. type SCEPService struct { issuer IssuerConnector issuerID string @@ -24,6 +37,281 @@ type SCEPService struct { profileID string // optional: constrain enrollments to a specific profile profileRepo repository.CertificateProfileRepository challengePassword string // shared secret for enrollment authentication + + // Intune dispatcher state (Phase 8.3+8.6+8.7). All nil/zero when this + // profile has INTUNE_ENABLED=false; all populated when true. The + // dispatcher in PKCSReq + PKCSReqWithEnvelope + RenewalReqWithEnvelope + // gates on intuneEnabled before consulting any of these. + intuneEnabled bool + intuneTrust *intune.TrustAnchorHolder // SIGHUP-reloadable trust pool + intuneAudience string // expected "aud" claim; empty disables the check + intuneValidity time.Duration // optional override on top of the challenge's exp + intuneReplayCache *intune.ReplayCache // nonce-keyed; catches duplicate submission + intuneRateLimiter *intune.PerDeviceRateLimiter + complianceCheck ComplianceCheck // V3-Pro plug-in seam; nil-default no-op +} + +// ComplianceCheck is the optional gate that pings Intune's compliance API +// (or any custom policy backend) to confirm the device is in good standing +// before issuing a cert. When nil (the V2-free default), the gate is a +// no-op and enrollments proceed solely on challenge validation + +// claim-binding + replay + per-device rate limit. +// +// SCEP RFC 8894 + Intune master bundle Phase 8.7 — V3-Pro plug-in seam. +// +// V3-Pro plugs in here via a new module that calls Microsoft Graph's +// /deviceManagement/managedDevices/{id}/compliancePolicyStates endpoint +// (or equivalent), wires SetComplianceCheck on the service, and +// short-circuits non-compliant device enrollments with a SCEP CertRep +// FAILURE/badRequest plus a compliance_failed audit event + metric. +// +// Return contract: +// +// - compliant=true, err=nil → proceed with enrollment. +// - compliant=false, err=nil → CertRep FAILURE + compliance_failed metric; +// the reason string flows into the audit event for ops triage. +// - compliant=*, err!=nil → fail-safe (deny) by default; the V3-Pro +// module is responsible for a more nuanced "permit on API failure" +// mode if its policy demands one. +// +// Leaving the hook here means the V3-Pro work is plug-in code, not a +// dispatcher refactor. The cost today is one struct field + one setter + +// one nil-guarded call site. Zero behavior change in V2. +type ComplianceCheck func(ctx context.Context, claim *intune.ChallengeClaim) (compliant bool, reason string, err error) + +// SetComplianceCheck installs the V3-Pro compliance gate. Idempotent; +// passing nil re-disables the gate (useful for tests + the rare case where +// V3-Pro plugin code wants to drop the gate at runtime). Safe to call +// before or after the service starts serving requests. +func (s *SCEPService) SetComplianceCheck(fn ComplianceCheck) { s.complianceCheck = fn } + +// SetIntuneIntegration wires the per-profile Intune dispatcher onto the +// service. Pass enabled=false (with nil/zero values for the rest) to +// explicitly opt this profile out of Intune mode; pass enabled=true with +// a populated trust holder + replay cache + rate limiter to opt in. The +// audience is allowed to be empty (the validator's audience check then +// becomes a no-op, useful for proxy/load-balancer scenarios where the URL +// the Connector saw differs from the URL we see). +// +// Constructor-time injection (rather than NewSCEPService extra params) +// keeps the surface stable for the existing callers + lets the wire-in +// at cmd/server/main.go construct the holder + cache + limiter once and +// share them across profiles cleanly. Profiles where INTUNE_ENABLED=false +// simply never call this method. +func (s *SCEPService) SetIntuneIntegration( + trust *intune.TrustAnchorHolder, + audience string, + validity time.Duration, + replayCache *intune.ReplayCache, + rateLimiter *intune.PerDeviceRateLimiter, +) { + s.intuneEnabled = true + s.intuneTrust = trust + s.intuneAudience = audience + s.intuneValidity = validity + s.intuneReplayCache = replayCache + s.intuneRateLimiter = rateLimiter +} + +// IntuneEnabled reports whether this service instance is wired for Intune +// dynamic-challenge dispatch. Useful for handler-layer gating + admin +// endpoints (Phase 9 GUI surface). Always returns false on profiles where +// SetIntuneIntegration was never called. +func (s *SCEPService) IntuneEnabled() bool { return s.intuneEnabled } + +// looksIntuneShaped is the fast pre-check that distinguishes an +// Intune-format challenge from a static challenge password. Intune +// challenges are JWT-like (three base64url segments separated by dots, +// total length > 200 bytes for any reasonable claim payload). Static +// challenges are typically ≤ 64 bytes ASCII. +// +// SCEP RFC 8894 + Intune master bundle Phase 8.3. +// +// The heuristic is allowed to false-positive (the validator catches +// malformed input → ErrChallengeMalformed), but it MUST NOT false-negative +// on real Intune challenges — that would route an Intune challenge to the +// constant-time static compare and reject every enrollment. Hence the +// generous length threshold (real Intune challenges are typically +// >800 bytes; the 200 floor is well below the smallest plausible v1 +// payload + signature). +func looksIntuneShaped(s string) bool { + if len(s) <= 200 { + return false + } + return strings.Count(s, ".") == 2 +} + +// intuneFailReason maps a typed Intune error to the metric label used in +// `certctl_scep_intune_enrollments_total{status="..."}`. Defaults to +// "malformed" so a previously-unseen error category still surfaces in +// the metric (with a follow-up to add a typed branch here). +func intuneFailReason(err error) string { + switch { + case err == nil: + return "success" + case errors.Is(err, intune.ErrChallengeSignature): + return "signature_invalid" + case errors.Is(err, intune.ErrChallengeExpired): + return "expired" + case errors.Is(err, intune.ErrChallengeNotYetValid): + return "not_yet_valid" + case errors.Is(err, intune.ErrChallengeWrongAudience): + return "wrong_audience" + case errors.Is(err, intune.ErrChallengeReplay): + return "replay" + case errors.Is(err, intune.ErrChallengeUnknownVersion): + return "unknown_version" + case errors.Is(err, intune.ErrChallengeMalformed): + return "malformed" + case errors.Is(err, intune.ErrRateLimited): + return "rate_limited" + case errors.Is(err, intune.ErrClaimCNMismatch), + errors.Is(err, intune.ErrClaimSANDNSMismatch), + errors.Is(err, intune.ErrClaimSANRFC822Mismatch), + errors.Is(err, intune.ErrClaimSANUPNMismatch): + return "claim_mismatch" + default: + return "malformed" + } +} + +// intuneEnrollOutcome is the envelope the dispatcher hands back to its two +// callers (PKCSReq's MVP path + PKCSReqWithEnvelope/RenewalReqWithEnvelope's +// RFC 8894 path). It carries enough to short-circuit OR continue to the +// existing processEnrollment flow: +// +// - decided=false → not Intune-shaped (or Intune disabled); fall through +// to the static-challenge path. +// - decided=true, err=nil → Intune validation passed; the caller MUST +// call processEnrollment with auditAction="scep_pkcsreq_intune". +// - decided=true, err!=nil → Intune validation failed; the caller MUST +// short-circuit with the typed error (handler maps to FailInfo). +type intuneEnrollOutcome struct { + decided bool + claim *intune.ChallengeClaim + err error +} + +// dispatchIntuneChallenge runs the full Intune validation pipeline for a +// single PKCSReq invocation: shape check → ValidateChallenge → DeviceMatchesCSR +// → replay-cache CheckAndInsert → per-device rate limit → optional +// compliance check. Each failure leg increments the appropriate metric +// label + emits an audit-friendly Warn log line. Returns an outcome that +// tells the caller whether to short-circuit or continue to enrollment. +// +// Splitting the dispatcher out of PKCSReq* keeps the three call sites +// (PKCSReq, PKCSReqWithEnvelope, RenewalReqWithEnvelope) consistent — every +// path through the Intune mode runs through the same gate sequence so an +// operator gets the same audit shape regardless of which SCEP message +// type the device sent. +func (s *SCEPService) dispatchIntuneChallenge(ctx context.Context, csrPEM string, challengePassword string, transactionID string) intuneEnrollOutcome { + if !s.intuneEnabled || !looksIntuneShaped(challengePassword) { + return intuneEnrollOutcome{decided: false} + } + if s.intuneTrust == nil { + // Defensive: enabled bit was flipped without wiring the trust + // holder. Treat as a hard failure so the operator sees it + // instead of silently falling through to the static path. + s.logger.Error("SCEP enrollment rejected: Intune mode enabled but no trust anchor holder wired", + "transaction_id", transactionID) + return intuneEnrollOutcome{decided: true, err: intune.ErrChallengeSignature} + } + + now := time.Now() + trust := s.intuneTrust.Get() + + claim, err := intune.ValidateChallenge(challengePassword, trust, s.intuneAudience, now) + if err != nil { + s.logger.Warn("SCEP enrollment rejected: Intune challenge validation failed", + "transaction_id", transactionID, "reason", intuneFailReason(err), "error", err) + return intuneEnrollOutcome{decided: true, err: err} + } + + // Defense-in-depth validity cap on top of the challenge's own iat/exp. + // When intuneValidity is non-zero, the challenge's iat must be within + // (now - intuneValidity, now]; an old-but-not-yet-expired challenge + // (per the Connector's exp claim) gets rejected here. + if s.intuneValidity > 0 && !claim.IssuedAt.IsZero() && now.Sub(claim.IssuedAt) > s.intuneValidity { + err := fmt.Errorf("%w: iat=%s exceeds operator-configured validity cap %s", + intune.ErrChallengeExpired, claim.IssuedAt.Format(time.RFC3339), s.intuneValidity) + s.logger.Warn("SCEP enrollment rejected: Intune challenge older than operator validity cap", + "transaction_id", transactionID, "error", err) + return intuneEnrollOutcome{decided: true, err: err} + } + + // Bind claim ↔ CSR before consuming the replay-cache slot. If the CSR + // doesn't match the claim, we don't want to mark the nonce as seen + // (the next legitimate retry should still work). + csr, perr := parseCSRForIntune(csrPEM) + if perr != nil { + s.logger.Warn("SCEP enrollment rejected: CSR parse failed during Intune dispatch", + "transaction_id", transactionID, "error", perr) + // CSR parse failure surfaces as a "malformed" intune metric label + // (the wrapping helps the audit log distinguish it from a + // challenge-malformed failure). + return intuneEnrollOutcome{decided: true, err: fmt.Errorf("%w: CSR parse: %v", intune.ErrChallengeMalformed, perr)} + } + if mErr := claim.DeviceMatchesCSR(csr); mErr != nil { + s.logger.Warn("SCEP enrollment rejected: Intune claim does not match CSR", + "transaction_id", transactionID, "error", mErr) + return intuneEnrollOutcome{decided: true, err: mErr} + } + + // Replay protection — runs AFTER claim validation + CSR binding so a + // failed validation doesn't burn a replay slot on a legitimate retry. + if s.intuneReplayCache != nil && claim.Nonce != "" { + if !s.intuneReplayCache.CheckAndInsert(claim.Nonce, now) { + err := fmt.Errorf("%w: nonce=%q", intune.ErrChallengeReplay, claim.Nonce) + s.logger.Warn("SCEP enrollment rejected: Intune challenge nonce replay", + "transaction_id", transactionID, "subject", claim.Subject) + return intuneEnrollOutcome{decided: true, err: err} + } + } + + // Per-device rate limit — second line of defense against a compromised + // Connector signing key issuing many DIFFERENT valid challenges for + // the same device. + if s.intuneRateLimiter != nil { + if rlErr := s.intuneRateLimiter.Allow(claim.Subject, claim.Issuer, now); rlErr != nil { + s.logger.Warn("SCEP enrollment rejected: Intune per-device rate limit exceeded", + "transaction_id", transactionID, "subject", claim.Subject, "issuer", claim.Issuer) + return intuneEnrollOutcome{decided: true, err: rlErr} + } + } + + // Optional V3-Pro compliance hook (nil-default no-op in V2). Runs LAST + // so we don't ping the compliance API for requests we'd reject anyway. + if s.complianceCheck != nil { + compliant, reason, cerr := s.complianceCheck(ctx, claim) + if cerr != nil { + s.logger.Error("Intune compliance check returned error; failing closed", + "transaction_id", transactionID, "subject", claim.Subject, "error", cerr) + return intuneEnrollOutcome{decided: true, err: fmt.Errorf("intune compliance check: %w", cerr)} + } + if !compliant { + s.logger.Warn("SCEP enrollment rejected: device non-compliant per Intune compliance check", + "transaction_id", transactionID, "subject", claim.Subject, "reason", reason) + return intuneEnrollOutcome{decided: true, err: fmt.Errorf("intune compliance: %s", reason)} + } + } + + return intuneEnrollOutcome{decided: true, claim: claim} +} + +// parseCSRForIntune is a thin wrapper around encoding/pem + x509 that the +// dispatcher uses for the claim ↔ CSR binding check. Kept private + named +// for grepability so a future refactor can swap the parse strategy without +// touching the dispatcher. +func parseCSRForIntune(csrPEM string) (*x509.CertificateRequest, error) { + block, _ := pem.Decode([]byte(csrPEM)) + if block == nil { + return nil, fmt.Errorf("invalid CSR PEM") + } + csr, err := x509.ParseCertificateRequest(block.Bytes) + if err != nil { + return nil, fmt.Errorf("parse CSR: %w", err) + } + return csr, nil } // NewSCEPService creates a new SCEPService for the given issuer connector. @@ -86,6 +374,19 @@ func (s *SCEPService) GetCACert(ctx context.Context) (string, error) { // non-empty branch now uses crypto/subtle.ConstantTimeCompare to avoid leaking // the shared secret through a response-time side channel. func (s *SCEPService) PKCSReq(ctx context.Context, csrPEM string, challengePassword string, transactionID string) (*domain.SCEPEnrollResult, error) { + // SCEP RFC 8894 + Intune master bundle Phase 8.3: try the Intune + // dispatcher first. When it returns decided=true the service has + // already made the call (success or typed failure); when decided=false + // we fall through to the existing static-challenge path. The + // dispatcher gates internally on intuneEnabled + looksIntuneShaped, + // so this is a free no-op for profiles where Intune is disabled. + if outcome := s.dispatchIntuneChallenge(ctx, csrPEM, challengePassword, transactionID); outcome.decided { + if outcome.err != nil { + return nil, fmt.Errorf("intune challenge: %w", outcome.err) + } + return s.processEnrollment(ctx, csrPEM, transactionID, "scep_pkcsreq_intune") + } + // Defense-in-depth: refuse any enrollment when no shared secret is // configured. The server-level pre-flight check in cmd/server/main.go // normally prevents the service from being constructed in this state, but @@ -258,6 +559,29 @@ func (s *SCEPService) PKCSReqWithEnvelope(ctx context.Context, csrPEM string, ch RecipientNonce: envelope.SenderNonce, } + // SCEP RFC 8894 + Intune master bundle Phase 8.3: same dispatcher as + // PKCSReq, applied to the RFC 8894 path. The dispatcher runs AFTER the + // EnvelopedData decryption + POPO verification (handler-side, before + // the service is invoked) but BEFORE the static-challenge fallback. On + // Intune-validation failure the response envelope carries a typed + // FailInfo so the CertRep wire shape is preserved (RFC 8894 §3.3). + if outcome := s.dispatchIntuneChallenge(ctx, csrPEM, challengePassword, envelope.TransactionID); outcome.decided { + if outcome.err != nil { + resp.Status = domain.SCEPStatusFailure + resp.FailInfo = mapIntuneErrorToFailInfo(outcome.err) + return resp + } + result, err := s.processEnrollment(ctx, csrPEM, envelope.TransactionID, "scep_pkcsreq_intune") + if err != nil { + resp.Status = domain.SCEPStatusFailure + resp.FailInfo = mapServiceErrorToFailInfo(err) + return resp + } + resp.Status = domain.SCEPStatusSuccess + resp.Result = result + return resp + } + // Defense-in-depth: refuse any enrollment when no shared secret is // configured. Mirrors PKCSReq's gate. Returning nil signals 'let the // caller translate to HTTP 403' — the existing PKCSReq path returns @@ -287,6 +611,41 @@ func (s *SCEPService) PKCSReqWithEnvelope(ctx context.Context, csrPEM string, ch return resp } +// mapIntuneErrorToFailInfo maps a typed Intune-validation error to the +// SCEP failInfo code RFC 8894 §3.2.1.4.5 enumerates. Mapping rationale: +// +// - Signature / replay / wrong-audience / expired / not-yet-valid → +// BadMessageCheck (the request didn't pass integrity / freshness +// checks; same wire shape as a tampered EnvelopedData). +// - Claim mismatches (CN / SAN-DNS / SAN-RFC822 / SAN-UPN) → BadRequest +// (the request was well-formed and signed but the asserted identity +// doesn't match what the device actually requested). +// - Rate-limited / unknown-version → BadRequest (no better wire-level +// code; the audit log carries the exact reason). +// - Malformed → BadRequest. +// - Compliance failure → BadRequest (V3-Pro can swap to a more +// specific code if it cares). +func mapIntuneErrorToFailInfo(err error) domain.SCEPFailInfo { + if err == nil { + return domain.SCEPFailBadRequest + } + switch { + case errors.Is(err, intune.ErrChallengeSignature), + errors.Is(err, intune.ErrChallengeExpired), + errors.Is(err, intune.ErrChallengeNotYetValid), + errors.Is(err, intune.ErrChallengeWrongAudience), + errors.Is(err, intune.ErrChallengeReplay): + return domain.SCEPFailBadMessageCheck + case errors.Is(err, intune.ErrClaimCNMismatch), + errors.Is(err, intune.ErrClaimSANDNSMismatch), + errors.Is(err, intune.ErrClaimSANRFC822Mismatch), + errors.Is(err, intune.ErrClaimSANUPNMismatch): + return domain.SCEPFailBadRequest + default: + return domain.SCEPFailBadRequest + } +} + // mapServiceErrorToFailInfo translates a service-layer error into the // SCEP failInfo code RFC 8894 §3.2.1.4.5 enumerates. The mapping mirrors // the table in PKCSReqWithEnvelope's docblock; defaults to BadRequest @@ -345,6 +704,38 @@ func (s *SCEPService) RenewalReqWithEnvelope(ctx context.Context, csrPEM string, RecipientNonce: envelope.SenderNonce, } + // SCEP RFC 8894 + Intune master bundle Phase 8.3: Intune dispatcher + // applies to RenewalReq too. The chain-validation gate further down + // stays in place — Intune-managed devices still need to present a + // previously-issued cert as POPO when re-enrolling. The Intune + // validator covers "is this a legitimate Intune challenge?" and the + // chain check covers "did this device hold a prior cert from this + // issuer?" — both must pass. + if outcome := s.dispatchIntuneChallenge(ctx, csrPEM, challengePassword, envelope.TransactionID); outcome.decided { + if outcome.err != nil { + resp.Status = domain.SCEPStatusFailure + resp.FailInfo = mapIntuneErrorToFailInfo(outcome.err) + return resp + } + // Chain-of-trust check still applies on renewal even via Intune. + if err := s.verifyRenewalSignerCertChain(ctx, envelope.SignerCert); err != nil { + s.logger.Warn("SCEP renewal rejected: signer cert chain invalid (Intune path)", + "transaction_id", envelope.TransactionID, "error", err.Error()) + resp.Status = domain.SCEPStatusFailure + resp.FailInfo = domain.SCEPFailBadMessageCheck + return resp + } + result, err := s.processEnrollment(ctx, csrPEM, envelope.TransactionID, "scep_renewalreq_intune") + if err != nil { + resp.Status = domain.SCEPStatusFailure + resp.FailInfo = mapServiceErrorToFailInfo(err) + return resp + } + resp.Status = domain.SCEPStatusSuccess + resp.Result = result + return resp + } + // Same challenge-password gate as PKCSReqWithEnvelope. Defense in depth // even though the RenewalReq path additionally verifies the signing // cert chain — a stolen/leaked challenge password combined with a diff --git a/internal/service/scep_intune_test.go b/internal/service/scep_intune_test.go new file mode 100644 index 0000000..c144749 --- /dev/null +++ b/internal/service/scep_intune_test.go @@ -0,0 +1,487 @@ +package service + +import ( + "context" + "crypto" + "crypto/ecdsa" + "crypto/elliptic" + "crypto/rand" + "crypto/rsa" + "crypto/sha256" + "crypto/x509" + "crypto/x509/pkix" + "encoding/base64" + "encoding/json" + "errors" + "log/slog" + "math/big" + "os" + "strings" + "testing" + "time" + + "github.com/shankar0123/certctl/internal/scep/intune" +) + +// SCEP RFC 8894 + Intune master bundle Phase 8.9 — service-layer dispatcher +// tests. Exercises the looksIntuneShaped pre-check, the validator + claim +// binding, the replay cache + per-device rate limiter integration, and the +// nil-default compliance hook seam. + +// ------------------------------------------------------------------ +// Test plumbing. +// ------------------------------------------------------------------ + +func newTestSCEPLogger() *slog.Logger { + return slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelError})) +} + +// intuneTestConn manufactures an ephemeral RSA Connector signing cert + key +// for tests that build challenges by hand. Mirrors challenge_test.go's +// helper but lives in the service package so tests can exercise the full +// dispatcher path. +type intuneTestConn struct { + key *rsa.PrivateKey + cert *x509.Certificate +} + +func newIntuneTestConn(t *testing.T) intuneTestConn { + t.Helper() + key, err := rsa.GenerateKey(rand.Reader, 2048) + if err != nil { + t.Fatalf("rsa.GenerateKey: %v", err) + } + tmpl := &x509.Certificate{ + SerialNumber: big.NewInt(1), + Subject: pkix.Name{CommonName: "test-intune-connector"}, + NotBefore: time.Now().Add(-1 * time.Hour), + NotAfter: time.Now().Add(365 * 24 * time.Hour), + BasicConstraintsValid: true, + } + der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, &key.PublicKey, key) + if err != nil { + t.Fatalf("x509.CreateCertificate: %v", err) + } + cert, err := x509.ParseCertificate(der) + if err != nil { + t.Fatalf("x509.ParseCertificate: %v", err) + } + return intuneTestConn{key: key, cert: cert} +} + +// signTestChallenge hand-builds a signed Intune-shaped challenge with the +// caller-supplied claim payload. Returns the wire-format string ready to +// pass as the "challenge password" argument to PKCSReq. +func (c intuneTestConn) signTestChallenge(t *testing.T, payload any) string { + t.Helper() + hdr, _ := json.Marshal(map[string]string{"alg": "RS256", "typ": "JWT"}) + pl, _ := json.Marshal(payload) + signingInput := base64.RawURLEncoding.EncodeToString(hdr) + "." + + base64.RawURLEncoding.EncodeToString(pl) + h := sha256.Sum256([]byte(signingInput)) + sig, err := rsa.SignPKCS1v15(rand.Reader, c.key, crypto.SHA256, h[:]) + if err != nil { + t.Fatalf("rsa.SignPKCS1v15: %v", err) + } + return signingInput + "." + base64.RawURLEncoding.EncodeToString(sig) +} + +// holderFromCerts wraps a static slice of certs as a TrustAnchorHolder +// without going through the on-disk loader. Used for tests that drive +// validation without writing a temp PEM file. +func holderFromCerts(t *testing.T, certs []*x509.Certificate) *intune.TrustAnchorHolder { + t.Helper() + dir := t.TempDir() + path := dir + "/intune-trust.pem" + // Write a real bundle so the holder can Reload later if the test wants. + body := []byte{} + for _, c := range certs { + body = append(body, []byte("-----BEGIN CERTIFICATE-----\n")...) + b64 := base64.StdEncoding.EncodeToString(c.Raw) + // Wrap to 64-char lines per PEM convention. + for len(b64) > 64 { + body = append(body, []byte(b64[:64]+"\n")...) + b64 = b64[64:] + } + body = append(body, []byte(b64+"\n-----END CERTIFICATE-----\n")...) + } + if err := os.WriteFile(path, body, 0o600); err != nil { + t.Fatalf("WriteFile trust bundle: %v", err) + } + holder, err := intune.NewTrustAnchorHolder(path, newTestSCEPLogger()) + if err != nil { + t.Fatalf("NewTrustAnchorHolder: %v", err) + } + return holder +} + +// validIntunePayload returns a v1 challenge payload whose claim matches a +// CSR generated via generateCSRPEM(t, "device.example.com", []string{...}). +// Tests can mutate it before signing to exercise individual failure modes. +func validIntunePayload(now time.Time) map[string]any { + return map[string]any{ + "iss": "test-intune-connector-installation", + "sub": "device-guid-001", + "aud": "https://certctl.example.com/scep/corp", + "iat": now.Add(-1 * time.Minute).Unix(), + "exp": now.Add(59 * time.Minute).Unix(), + "nonce": "nonce-001", + "device_name": "device.example.com", + "san_dns": []string{"device.example.com"}, + } +} + +// ------------------------------------------------------------------ +// Dispatcher behavior. +// ------------------------------------------------------------------ + +func TestSCEPService_LooksIntuneShaped(t *testing.T) { + cases := []struct { + name string + in string + want bool + }{ + {"empty", "", false}, + {"short static password", "secret123", false}, + {"long but no dots", strings.Repeat("a", 300), false}, + {"long with two dots (intune-shaped)", strings.Repeat("a", 80) + "." + strings.Repeat("b", 80) + "." + strings.Repeat("c", 80), true}, + {"long with three dots (not intune)", "a.b.c.d", false}, + {"exactly 200 bytes (boundary, not intune)", strings.Repeat("a", 100) + "." + strings.Repeat("a", 99), false}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + if got := looksIntuneShaped(tc.in); got != tc.want { + t.Errorf("looksIntuneShaped(%q) = %v, want %v", tc.in[:min(40, len(tc.in))]+"…", got, tc.want) + } + }) + } +} + +func TestSCEPService_PKCSReq_IntuneDispatcher_Success(t *testing.T) { + conn := newIntuneTestConn(t) + mockIssuer := &mockIssuerConnector{} + auditRepo := newMockAuditRepository() + auditSvc := NewAuditService(auditRepo) + + // Service has the legacy challenge password set (we want to verify the + // dispatcher takes precedence over the static path when intune-shaped). + svc := NewSCEPService("iss-local", mockIssuer, auditSvc, newTestSCEPLogger(), "static-secret") + holder := holderFromCerts(t, []*x509.Certificate{conn.cert}) + svc.SetIntuneIntegration( + holder, + "https://certctl.example.com/scep/corp", + 60*time.Minute, + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(3, 24*time.Hour, 100), + ) + + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + challenge := conn.signTestChallenge(t, validIntunePayload(time.Now())) + + result, err := svc.PKCSReq(context.Background(), csrPEM, challenge, "txn-intune-001") + if err != nil { + t.Fatalf("PKCSReq: %v", err) + } + if result == nil || result.CertPEM == "" { + t.Fatalf("expected non-empty cert; got %#v", result) + } + + // The audit event should carry the Intune-specific action code so + // operators can grep the audit log to count Intune enrollments + // distinct from static-challenge enrollments. + if len(auditRepo.Events) == 0 { + t.Fatalf("expected an audit event") + } + if got := auditRepo.Events[0].Action; got != "scep_pkcsreq_intune" { + t.Errorf("audit action = %q, want scep_pkcsreq_intune (Phase 8.4)", got) + } +} + +func TestSCEPService_PKCSReq_IntuneDispatcher_StaticChallengeStillWorks(t *testing.T) { + // Operator deploy that has Intune enabled on a profile but a device + // sends a SHORT static challenge — must still work via the fallback path. + conn := newIntuneTestConn(t) + mockIssuer := &mockIssuerConnector{} + svc := NewSCEPService("iss-local", mockIssuer, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "https://certctl.example.com/scep/corp", + 60*time.Minute, + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(3, 24*time.Hour, 100), + ) + + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + if _, err := svc.PKCSReq(context.Background(), csrPEM, "static-secret", "txn-static-001"); err != nil { + t.Fatalf("static-challenge fallback should still work when Intune enabled: %v", err) + } +} + +func TestSCEPService_PKCSReq_IntuneDispatcher_TamperedChallengeRejected(t *testing.T) { + conn := newIntuneTestConn(t) + svc := NewSCEPService("iss-local", &mockIssuerConnector{}, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "", + 60*time.Minute, + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(3, 24*time.Hour, 100), + ) + + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + good := conn.signTestChallenge(t, validIntunePayload(time.Now())) + parts := strings.Split(good, ".") + sig, _ := base64.RawURLEncoding.DecodeString(parts[2]) + sig[0] ^= 0xFF + parts[2] = base64.RawURLEncoding.EncodeToString(sig) + tampered := strings.Join(parts, ".") + + _, err := svc.PKCSReq(context.Background(), csrPEM, tampered, "txn-tamper-001") + if err == nil { + t.Fatal("expected tampered challenge to be rejected") + } + if !errors.Is(err, intune.ErrChallengeSignature) { + t.Errorf("got %v, want errors.Is(ErrChallengeSignature)", err) + } +} + +func TestSCEPService_PKCSReq_IntuneDispatcher_ClaimMismatchRejected(t *testing.T) { + conn := newIntuneTestConn(t) + svc := NewSCEPService("iss-local", &mockIssuerConnector{}, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "", + 60*time.Minute, + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(3, 24*time.Hour, 100), + ) + + // CSR's CN ("attacker-host.example.com") does NOT match the claim's + // device_name ("device.example.com"). + csrPEM := generateCSRPEM(t, "attacker-host.example.com", []string{"attacker-host.example.com"}) + challenge := conn.signTestChallenge(t, validIntunePayload(time.Now())) + + _, err := svc.PKCSReq(context.Background(), csrPEM, challenge, "txn-mismatch-001") + if err == nil { + t.Fatal("expected claim mismatch to be rejected") + } + if !errors.Is(err, intune.ErrClaimCNMismatch) { + t.Errorf("got %v, want ErrClaimCNMismatch", err) + } +} + +func TestSCEPService_PKCSReq_IntuneDispatcher_ReplayDetected(t *testing.T) { + conn := newIntuneTestConn(t) + svc := NewSCEPService("iss-local", &mockIssuerConnector{}, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "", + 60*time.Minute, + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(0, 24*time.Hour, 100), // disable rate limit so we don't trip THAT first + ) + + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + challenge := conn.signTestChallenge(t, validIntunePayload(time.Now())) + + if _, err := svc.PKCSReq(context.Background(), csrPEM, challenge, "txn-001"); err != nil { + t.Fatalf("first call should succeed: %v", err) + } + _, err := svc.PKCSReq(context.Background(), csrPEM, challenge, "txn-002") + if !errors.Is(err, intune.ErrChallengeReplay) { + t.Fatalf("got %v, want ErrChallengeReplay on the second call", err) + } +} + +func TestSCEPService_PKCSReq_IntuneDispatcher_RateLimited(t *testing.T) { + conn := newIntuneTestConn(t) + svc := NewSCEPService("iss-local", &mockIssuerConnector{}, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "", + 60*time.Minute, + // Replay cache must not block us — use disjoint nonces per call. + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(2, 24*time.Hour, 100), // limit = 2 + ) + + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + + for i := 0; i < 2; i++ { + pl := validIntunePayload(time.Now()) + pl["nonce"] = "nonce-" + string(rune('a'+i)) + ch := conn.signTestChallenge(t, pl) + if _, err := svc.PKCSReq(context.Background(), csrPEM, ch, "txn-allow"); err != nil { + t.Fatalf("call %d should succeed: %v", i+1, err) + } + } + // 3rd call same (Subject, Issuer) → rate limited. + pl := validIntunePayload(time.Now()) + pl["nonce"] = "nonce-third" + third := conn.signTestChallenge(t, pl) + _, err := svc.PKCSReq(context.Background(), csrPEM, third, "txn-block") + if !errors.Is(err, intune.ErrRateLimited) { + t.Fatalf("got %v, want ErrRateLimited on 3rd call (cap=2)", err) + } +} + +// ------------------------------------------------------------------ +// Compliance-hook seam (Phase 8.7). +// ------------------------------------------------------------------ + +func TestSCEPService_PKCSReq_IntuneDispatcher_ComplianceHookNilDefault(t *testing.T) { + // Default state: no hook installed, enrollments proceed. + conn := newIntuneTestConn(t) + svc := NewSCEPService("iss-local", &mockIssuerConnector{}, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "", + 60*time.Minute, + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(3, 24*time.Hour, 100), + ) + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + challenge := conn.signTestChallenge(t, validIntunePayload(time.Now())) + if _, err := svc.PKCSReq(context.Background(), csrPEM, challenge, "txn-nil-hook"); err != nil { + t.Fatalf("nil-default compliance hook should be a no-op: %v", err) + } +} + +func TestSCEPService_PKCSReq_IntuneDispatcher_ComplianceHookDeniesNonCompliant(t *testing.T) { + conn := newIntuneTestConn(t) + svc := NewSCEPService("iss-local", &mockIssuerConnector{}, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "", + 60*time.Minute, + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(3, 24*time.Hour, 100), + ) + svc.SetComplianceCheck(func(ctx context.Context, claim *intune.ChallengeClaim) (bool, string, error) { + return false, "device under remediation", nil + }) + + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + challenge := conn.signTestChallenge(t, validIntunePayload(time.Now())) + _, err := svc.PKCSReq(context.Background(), csrPEM, challenge, "txn-noncompliant") + if err == nil { + t.Fatal("non-compliant device must be rejected") + } + if !strings.Contains(err.Error(), "intune compliance") { + t.Errorf("error should reference compliance reason: %v", err) + } + if !strings.Contains(err.Error(), "device under remediation") { + t.Errorf("error should preserve compliance reason for audit: %v", err) + } +} + +func TestSCEPService_PKCSReq_IntuneDispatcher_ComplianceHookErrorFailsClosed(t *testing.T) { + conn := newIntuneTestConn(t) + svc := NewSCEPService("iss-local", &mockIssuerConnector{}, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "", + 60*time.Minute, + intune.NewReplayCache(60*time.Minute, 100), + intune.NewPerDeviceRateLimiter(3, 24*time.Hour, 100), + ) + svc.SetComplianceCheck(func(ctx context.Context, claim *intune.ChallengeClaim) (bool, string, error) { + return false, "", errors.New("graph API down") + }) + + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + challenge := conn.signTestChallenge(t, validIntunePayload(time.Now())) + _, err := svc.PKCSReq(context.Background(), csrPEM, challenge, "txn-compl-err") + if err == nil { + t.Fatal("compliance API error must fail closed (deny)") + } +} + +// ------------------------------------------------------------------ +// IntuneEnabled accessor + miscellaneous wiring. +// ------------------------------------------------------------------ + +func TestSCEPService_IntuneEnabled_AccessorReflectsState(t *testing.T) { + svc := NewSCEPService("iss-local", &mockIssuerConnector{}, nil, newTestSCEPLogger(), "static") + if svc.IntuneEnabled() { + t.Fatal("freshly-built service must report IntuneEnabled=false") + } + conn := newIntuneTestConn(t) + svc.SetIntuneIntegration( + holderFromCerts(t, []*x509.Certificate{conn.cert}), + "", + 0, + nil, + nil, + ) + if !svc.IntuneEnabled() { + t.Fatal("after SetIntuneIntegration, IntuneEnabled() must report true") + } +} + +func TestSCEPService_PKCSReq_IntuneDisabled_StaticPathUnchanged(t *testing.T) { + // Sanity: a service that NEVER had SetIntuneIntegration called must + // behave exactly like the pre-Phase-8 service. This pins the no-regression + // guarantee for the broad set of profiles that won't enable Intune. + mockIssuer := &mockIssuerConnector{} + svc := NewSCEPService("iss-local", mockIssuer, NewAuditService(newMockAuditRepository()), newTestSCEPLogger(), "static-secret") + + csrPEM := generateCSRPEM(t, "device.example.com", []string{"device.example.com"}) + // Submit something Intune-shaped — without SetIntuneIntegration this + // must NOT route through the dispatcher (looksIntuneShaped + intuneEnabled + // are AND-gated). It will fall through to the static compare and reject. + intuneShaped := strings.Repeat("a", 80) + "." + strings.Repeat("b", 80) + "." + strings.Repeat("c", 80) + if _, err := svc.PKCSReq(context.Background(), csrPEM, intuneShaped, "txn-noop"); err == nil { + t.Fatal("static path with wrong password must reject (we passed an intune-shaped string but Intune is off)") + } + // Now submit the right static password — must succeed. + if _, err := svc.PKCSReq(context.Background(), csrPEM, "static-secret", "txn-noop-2"); err != nil { + t.Fatalf("static path with right password must work: %v", err) + } +} + +// ------------------------------------------------------------------ +// IntuneFailReason mapping. +// ------------------------------------------------------------------ + +func TestIntuneFailReason_AllTypedErrorsMapped(t *testing.T) { + cases := []struct { + err error + want string + }{ + {nil, "success"}, + {intune.ErrChallengeSignature, "signature_invalid"}, + {intune.ErrChallengeExpired, "expired"}, + {intune.ErrChallengeNotYetValid, "not_yet_valid"}, + {intune.ErrChallengeWrongAudience, "wrong_audience"}, + {intune.ErrChallengeReplay, "replay"}, + {intune.ErrChallengeUnknownVersion, "unknown_version"}, + {intune.ErrChallengeMalformed, "malformed"}, + {intune.ErrRateLimited, "rate_limited"}, + {intune.ErrClaimCNMismatch, "claim_mismatch"}, + {intune.ErrClaimSANDNSMismatch, "claim_mismatch"}, + {intune.ErrClaimSANRFC822Mismatch, "claim_mismatch"}, + {intune.ErrClaimSANUPNMismatch, "claim_mismatch"}, + {errors.New("something else"), "malformed"}, // default bucket + } + for _, tc := range cases { + got := intuneFailReason(tc.err) + if got != tc.want { + t.Errorf("intuneFailReason(%v) = %q, want %q", tc.err, got, tc.want) + } + } +} + +// asn1 unused but imported by sibling tests; this package-level guard keeps +// future changes that introduce ASN.1 fixtures here from breaking the build. +func init() { + _ = ecdsa.GenerateKey + _ = elliptic.P256 +} + +func min(a, b int) int { + if a < b { + return a + } + return b +}