mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-12 21:18:55 +00:00
7612da783a
Phase 8 of the SCEP RFC 8894 + Intune master bundle. Wires the internal/scep/intune validator from Phase 7 into the SCEPService dispatch path, with a SIGHUP-reloadable trust anchor holder, a per-(Subject, Issuer) sliding-window rate limiter, and a nil-default ComplianceCheck seam for V3-Pro. Operator-visible surface (per-profile, all default to off): CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CONNECTOR_CERT_PATH=/etc/certctl/intune.pem CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_AUDIENCE=https://certctl.example.com/scep/corp CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CHALLENGE_VALIDITY=60m CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_PER_DEVICE_RATE_LIMIT_24H=3 Per-profile dispatch (Phase 8.8): an operator running corp-laptops through Intune AND IoT devices through static challenge configures INTUNE_ENABLED=true on the corp profile only — the IoT profile's PKCSReq path skips the dispatcher entirely. Mirrors the per-profile shape established by Phase 1.5. Wire-in surfaces: * config.go (Phase 8.1): SCEPProfileConfig.Intune sub-config of type SCEPIntuneProfileConfig (Enabled/ConnectorCertPath/Audience/ ChallengeValidity/PerDeviceRateLimit24h). Loaded from the indexed CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_* env-var family. Per-profile Validate gate refuses INTUNE_ENABLED=true with empty ConnectorCertPath OR negative PerDeviceRateLimit24h. * cmd/server/main.go (Phase 8.2 + wire-in): preflightSCEPIntuneTrustAnchor helper mirrors preflightSCEPRACertKey/preflightSCEPMTLSTrustBundle shape — fail-loud at boot when the trust anchor file is missing / unreadable / empty / contains an expired cert. The per-profile loop builds the holder + replay cache + rate limiter, calls SetIntuneIntegration on the SCEPService, and starts the SIGHUP watcher. A deferred sweep stops every watcher at shutdown. * internal/scep/intune/trust_anchor_holder.go (Phase 8.5): TrustAnchorHolder mirrors cmd/server/tls.go::certHolder. RWMutex- guarded pool + Reload that swaps a fresh slice on success + WatchSIGHUP goroutine that responds to the same SIGHUP the existing TLS-cert watcher uses. A bad reload (parse error, expired cert) keeps the OLD pool in place so a half-rotation doesn't take Intune enrollment down — same fail-safe pattern. Operators rotate via the on-disk file then 'kill -HUP <certctl-pid>'. * internal/scep/intune/rate_limit.go (Phase 8.6): hand-rolled sliding-window-log limiter keyed by (Subject, Issuer). 100k-entry map cap (matches replay cache); at-cap drops the bucket whose newest timestamp is the oldest. Default 3 enrollments per 24h covers legitimate first-cert + recovery + post-wipe re-enrollment but blocks bulk enumeration from a compromised Connector signing key. maxN <= 0 disables the limiter for tests + the rare operator who wants no per-device cap. Empty subject short-circuits to allow (defense-in-depth: caller's claim validation rejects empty-subject upstream; no shared bucket on ''). Why hand-rolled instead of golang.org/x/time/rate: the rate package is in go.sum as an indirect transitive but not a direct dep. ~30 LoC of stdlib avoids creating a new direct dep. * internal/service/scep.go (Phase 8.3 + 8.4 + 8.7): - SCEPService gains intuneEnabled / intuneTrust / intuneAudience / intuneValidity / intuneReplayCache / intuneRateLimiter / complianceCheck fields. - SetIntuneIntegration() constructor-time injection wires the per-profile state. Profiles with INTUNE_ENABLED=false never call this method, so they pay zero overhead. - SetComplianceCheck() installs the V3-Pro plug-in (see Phase 8.7). - looksIntuneShaped(): JWT-shape pre-check (length > 200 + exactly two dots). Allowed to false-positive (validator catches malformed → ErrChallengeMalformed); MUST NOT false-negative on real Intune challenges. - dispatchIntuneChallenge(): the load-bearing core. Runs ValidateChallenge → CSR-binding via DeviceMatchesCSR → replay cache CheckAndInsert → per-device Allow → optional ComplianceCheck. Each failure leg increments a typed metric label and emits an audit-friendly Warn log line. - PKCSReq + PKCSReqWithEnvelope + RenewalReqWithEnvelope all call dispatchIntuneChallenge first; on outcome.decided=true they either short-circuit (with a typed-error → SCEPFailInfo mapping) or call processEnrollment with action='scep_pkcsreq_intune' (so audit greps can count Intune-vs-static enrollments). - mapIntuneErrorToFailInfo(): typed-error → SCEPFailInfo per RFC 8894 §3.2.1.4.5 (signature/replay/expired → BadMessageCheck; claim-mismatch → BadRequest; default → BadRequest). - intuneFailReason(): typed-error → metric label ('signature_invalid' / 'expired' / 'rate_limited' / etc.). Default 'malformed' so a previously-unseen error category still surfaces in the metric for follow-up. - ComplianceCheck (Phase 8.7): nil-default no-op gate. V3-Pro plugs in via SetComplianceCheck to call Microsoft Graph's compliance API. Returns (compliant, reason, err). nil-err + compliant=false → CertRep FAILURE + 'compliance' reason in audit. err != nil → fail-safe deny (V3-Pro module is responsible for any 'permit on API failure' policy). * internal/service/scep.go also gains parseCSRForIntune() — small private wrapper around encoding/pem + x509 used by the dispatcher for the claim ↔ CSR binding check (separated from the broader processEnrollment because we want to bind BEFORE consuming the replay-cache slot). Tests (gates: ≥85% coverage on intune package, ≥70% on service): * scep_intune_test.go (in internal/service): 14 dispatcher tests covering happy-path Intune enrollment + static-challenge fallback + tampered-challenge reject + claim-mismatch reject + replay detected + rate-limited + compliance-hook nil-default + compliance- hook denies non-compliant + compliance-hook error fails closed + IntuneEnabled accessor + 'no IntuneEnabled = static path unchanged' regression pin + intuneFailReason mapping for every typed error + looksIntuneShaped boundary cases. * trust_anchor_holder_test.go (in internal/scep/intune): NewLoadsBundle, NewRequiresLogger, NewSurfacesLoadError, ReloadHappyPath, ReloadKeepsOldOnFailure, ReloadKeepsOldOnExpired (the fail-safe semantics that make the SIGHUP path operator-friendly), WatchSIGHUPReloadsPool (real SIGHUP to self with poll-for-swap pattern mirroring cmd/server/tls_test.go), WatchSIGHUPStopIsClean (does NOT fire SIGHUP after stop — same caveat as the TLS test: the Go runtime would otherwise terminate the test runner on the next SIGHUP since signal.Stop has removed the handler). * rate_limit_test.go (in internal/scep/intune): AllowsUpToCap, DistinctKeysIndependent, WindowExpiry, DisabledBypass (maxN=0), NegativeCapDisabled, EmptySubjectShortCircuits (defense-in-depth against an empty-subject DoS chokepoint), DefaultCapsHonored, MapCapEvictsOldest (at-cap eviction branch), ConcurrentRaceFree (50 goroutines × 200 inserts), pruneOlderThan + the no-op case. Verification: * gofmt -l on all touched files: clean * go vet ./... : clean * staticcheck on intune/service/config/cmd-server: clean * go test -count=1 -cover ./internal/scep/intune/...: 94.8% (target ≥85%) * go test -short across intune+service+config+handler+cmd-server: all green * G-3 docs-drift CI guard reproduced locally: docs-only filtered= empty, config-only=empty. The new env vars match the existing CERTCTL_SCEP_ allowlist prefix. Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 8 cowork/scep-rfc8894-intune/progress.md Constitutional rule: 'Always take the complete path, not the easy path' (cowork/CLAUDE.md::Operating Rules) — operator can flip CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true and observe the dispatcher pick up Intune-shaped challenges end-to-end with no further code changes. Foundation + plumbing ship together.
144 lines
5.1 KiB
Go
144 lines
5.1 KiB
Go
package intune
|
|
|
|
import (
|
|
"crypto/x509"
|
|
"errors"
|
|
"log/slog"
|
|
"os"
|
|
"os/signal"
|
|
"sync"
|
|
"syscall"
|
|
)
|
|
|
|
// TrustAnchorHolder is the SIGHUP-reloadable wrapper around a per-profile
|
|
// Intune Connector trust anchor pool.
|
|
//
|
|
// SCEP RFC 8894 + Intune master bundle Phase 8.5.
|
|
//
|
|
// Mirrors the shape established by `cmd/server/tls.go::certHolder` for the
|
|
// server TLS cert: an RWMutex-guarded pool, a Get accessor that's safe for
|
|
// concurrent callers from the request path, a Reload that re-reads the file
|
|
// and atomically swaps the slice on success (failure leaves the OLD pool in
|
|
// place so a bad reload doesn't take Intune enrollment down), and a
|
|
// watchSIGHUP goroutine that responds to the same SIGHUP the operator uses
|
|
// to rotate the server TLS cert.
|
|
//
|
|
// Why SIGHUP specifically (vs fsnotify or a polling loop): SIGHUP is the
|
|
// repo-established convention (see cmd/server/tls.go). fsnotify would add a
|
|
// new direct dep + complicate the cleanup story. The operator's Connector-
|
|
// rotation script writes the new PEM bundle then sends SIGHUP — the same
|
|
// signal that already rotates the server TLS cert — and both swap atomically.
|
|
//
|
|
// Concurrency contract:
|
|
// - Get returns the pool slice header by value; the slice itself is
|
|
// immutable per-snapshot (Reload swaps a fresh slice rather than
|
|
// mutating the existing one). Callers may iterate the returned slice
|
|
// without holding any lock.
|
|
// - Reload acquires a write lock briefly for the swap. Concurrent Get
|
|
// calls block only for that swap window (microseconds).
|
|
// - watchSIGHUP runs at most one Reload at a time per holder.
|
|
type TrustAnchorHolder struct {
|
|
mu sync.RWMutex
|
|
certs []*x509.Certificate
|
|
path string
|
|
logger *slog.Logger
|
|
}
|
|
|
|
// NewTrustAnchorHolder loads the trust bundle and returns a holder. Returns
|
|
// the same fail-loud error LoadTrustAnchor does on initial load — the
|
|
// startup gate at cmd/server/main.go is supposed to refuse boot when this
|
|
// fails. Subsequent Reload errors are non-fatal (logged + old pool retained).
|
|
//
|
|
// The logger is required (never nil); the caller passes a per-profile
|
|
// scoped logger so SIGHUP-reload events show the PathID for triage.
|
|
func NewTrustAnchorHolder(path string, logger *slog.Logger) (*TrustAnchorHolder, error) {
|
|
if logger == nil {
|
|
return nil, errors.New("intune: TrustAnchorHolder requires a non-nil logger")
|
|
}
|
|
certs, err := LoadTrustAnchor(path)
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
return &TrustAnchorHolder{
|
|
certs: certs,
|
|
path: path,
|
|
logger: logger,
|
|
}, nil
|
|
}
|
|
|
|
// Get returns the current trust anchor pool. Safe for concurrent callers;
|
|
// the slice header is returned by value and the underlying slice is
|
|
// immutable per-snapshot (Reload swaps a fresh slice, doesn't mutate in
|
|
// place — see Reload).
|
|
func (h *TrustAnchorHolder) Get() []*x509.Certificate {
|
|
h.mu.RLock()
|
|
defer h.mu.RUnlock()
|
|
return h.certs
|
|
}
|
|
|
|
// Path returns the on-disk path the holder reloads from. Useful for
|
|
// observability (admin endpoints, log lines) without exposing the cert
|
|
// pool itself.
|
|
func (h *TrustAnchorHolder) Path() string {
|
|
return h.path
|
|
}
|
|
|
|
// Reload re-reads the trust anchor file at h.path and atomically swaps the
|
|
// pool. Returns the parse error if the new file is invalid; the OLD pool
|
|
// stays in place so a bad reload doesn't take Intune enrollment down.
|
|
//
|
|
// Same fail-safe pattern as cmd/server/tls.go::(*certHolder).Reload — a
|
|
// rotation that writes a half-file (operator overwrites the bundle while
|
|
// only some of the new certs are in it) would otherwise crash the
|
|
// service mid-rotation. Logging + retaining the old pool gives the
|
|
// operator a bounded window to fix and re-SIGHUP.
|
|
func (h *TrustAnchorHolder) Reload() error {
|
|
certs, err := LoadTrustAnchor(h.path)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
h.mu.Lock()
|
|
h.certs = certs
|
|
h.mu.Unlock()
|
|
return nil
|
|
}
|
|
|
|
// WatchSIGHUP installs a signal handler that calls Reload on each SIGHUP.
|
|
// The returned stop function closes the internal done channel and stops
|
|
// signal delivery so the goroutine can exit cleanly during shutdown.
|
|
//
|
|
// Errors from Reload are logged but do not terminate the watcher — the
|
|
// operator can fix the files and send another SIGHUP. Mirrors the
|
|
// (*certHolder).watchSIGHUP contract exactly.
|
|
//
|
|
// Multiple holders can coexist: each registers its own goroutine on the
|
|
// same SIGHUP signal. signal.Notify multicasts to every registered
|
|
// channel, so a single SIGHUP reloads every per-profile Intune trust
|
|
// anchor PLUS the server TLS cert in one operator action — exactly the
|
|
// design requirement (one SIGHUP rotates everything).
|
|
func (h *TrustAnchorHolder) WatchSIGHUP() (stop func()) {
|
|
ch := make(chan os.Signal, 1)
|
|
signal.Notify(ch, syscall.SIGHUP)
|
|
done := make(chan struct{})
|
|
go func() {
|
|
for {
|
|
select {
|
|
case <-ch:
|
|
if err := h.Reload(); err != nil {
|
|
h.logger.Error("Intune trust anchor reload failed; continuing with previous pool",
|
|
"error", err,
|
|
"path", h.path)
|
|
continue
|
|
}
|
|
h.logger.Info("Intune trust anchor reloaded via SIGHUP",
|
|
"path", h.path,
|
|
"certs_loaded", len(h.Get()))
|
|
case <-done:
|
|
signal.Stop(ch)
|
|
return
|
|
}
|
|
}
|
|
}()
|
|
return func() { close(done) }
|
|
}
|