mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 14:21:37 +00:00
7612da783a
Phase 8 of the SCEP RFC 8894 + Intune master bundle. Wires the internal/scep/intune validator from Phase 7 into the SCEPService dispatch path, with a SIGHUP-reloadable trust anchor holder, a per-(Subject, Issuer) sliding-window rate limiter, and a nil-default ComplianceCheck seam for V3-Pro. Operator-visible surface (per-profile, all default to off): CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CONNECTOR_CERT_PATH=/etc/certctl/intune.pem CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_AUDIENCE=https://certctl.example.com/scep/corp CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CHALLENGE_VALIDITY=60m CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_PER_DEVICE_RATE_LIMIT_24H=3 Per-profile dispatch (Phase 8.8): an operator running corp-laptops through Intune AND IoT devices through static challenge configures INTUNE_ENABLED=true on the corp profile only — the IoT profile's PKCSReq path skips the dispatcher entirely. Mirrors the per-profile shape established by Phase 1.5. Wire-in surfaces: * config.go (Phase 8.1): SCEPProfileConfig.Intune sub-config of type SCEPIntuneProfileConfig (Enabled/ConnectorCertPath/Audience/ ChallengeValidity/PerDeviceRateLimit24h). Loaded from the indexed CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_* env-var family. Per-profile Validate gate refuses INTUNE_ENABLED=true with empty ConnectorCertPath OR negative PerDeviceRateLimit24h. * cmd/server/main.go (Phase 8.2 + wire-in): preflightSCEPIntuneTrustAnchor helper mirrors preflightSCEPRACertKey/preflightSCEPMTLSTrustBundle shape — fail-loud at boot when the trust anchor file is missing / unreadable / empty / contains an expired cert. The per-profile loop builds the holder + replay cache + rate limiter, calls SetIntuneIntegration on the SCEPService, and starts the SIGHUP watcher. A deferred sweep stops every watcher at shutdown. * internal/scep/intune/trust_anchor_holder.go (Phase 8.5): TrustAnchorHolder mirrors cmd/server/tls.go::certHolder. RWMutex- guarded pool + Reload that swaps a fresh slice on success + WatchSIGHUP goroutine that responds to the same SIGHUP the existing TLS-cert watcher uses. A bad reload (parse error, expired cert) keeps the OLD pool in place so a half-rotation doesn't take Intune enrollment down — same fail-safe pattern. Operators rotate via the on-disk file then 'kill -HUP <certctl-pid>'. * internal/scep/intune/rate_limit.go (Phase 8.6): hand-rolled sliding-window-log limiter keyed by (Subject, Issuer). 100k-entry map cap (matches replay cache); at-cap drops the bucket whose newest timestamp is the oldest. Default 3 enrollments per 24h covers legitimate first-cert + recovery + post-wipe re-enrollment but blocks bulk enumeration from a compromised Connector signing key. maxN <= 0 disables the limiter for tests + the rare operator who wants no per-device cap. Empty subject short-circuits to allow (defense-in-depth: caller's claim validation rejects empty-subject upstream; no shared bucket on ''). Why hand-rolled instead of golang.org/x/time/rate: the rate package is in go.sum as an indirect transitive but not a direct dep. ~30 LoC of stdlib avoids creating a new direct dep. * internal/service/scep.go (Phase 8.3 + 8.4 + 8.7): - SCEPService gains intuneEnabled / intuneTrust / intuneAudience / intuneValidity / intuneReplayCache / intuneRateLimiter / complianceCheck fields. - SetIntuneIntegration() constructor-time injection wires the per-profile state. Profiles with INTUNE_ENABLED=false never call this method, so they pay zero overhead. - SetComplianceCheck() installs the V3-Pro plug-in (see Phase 8.7). - looksIntuneShaped(): JWT-shape pre-check (length > 200 + exactly two dots). Allowed to false-positive (validator catches malformed → ErrChallengeMalformed); MUST NOT false-negative on real Intune challenges. - dispatchIntuneChallenge(): the load-bearing core. Runs ValidateChallenge → CSR-binding via DeviceMatchesCSR → replay cache CheckAndInsert → per-device Allow → optional ComplianceCheck. Each failure leg increments a typed metric label and emits an audit-friendly Warn log line. - PKCSReq + PKCSReqWithEnvelope + RenewalReqWithEnvelope all call dispatchIntuneChallenge first; on outcome.decided=true they either short-circuit (with a typed-error → SCEPFailInfo mapping) or call processEnrollment with action='scep_pkcsreq_intune' (so audit greps can count Intune-vs-static enrollments). - mapIntuneErrorToFailInfo(): typed-error → SCEPFailInfo per RFC 8894 §3.2.1.4.5 (signature/replay/expired → BadMessageCheck; claim-mismatch → BadRequest; default → BadRequest). - intuneFailReason(): typed-error → metric label ('signature_invalid' / 'expired' / 'rate_limited' / etc.). Default 'malformed' so a previously-unseen error category still surfaces in the metric for follow-up. - ComplianceCheck (Phase 8.7): nil-default no-op gate. V3-Pro plugs in via SetComplianceCheck to call Microsoft Graph's compliance API. Returns (compliant, reason, err). nil-err + compliant=false → CertRep FAILURE + 'compliance' reason in audit. err != nil → fail-safe deny (V3-Pro module is responsible for any 'permit on API failure' policy). * internal/service/scep.go also gains parseCSRForIntune() — small private wrapper around encoding/pem + x509 used by the dispatcher for the claim ↔ CSR binding check (separated from the broader processEnrollment because we want to bind BEFORE consuming the replay-cache slot). Tests (gates: ≥85% coverage on intune package, ≥70% on service): * scep_intune_test.go (in internal/service): 14 dispatcher tests covering happy-path Intune enrollment + static-challenge fallback + tampered-challenge reject + claim-mismatch reject + replay detected + rate-limited + compliance-hook nil-default + compliance- hook denies non-compliant + compliance-hook error fails closed + IntuneEnabled accessor + 'no IntuneEnabled = static path unchanged' regression pin + intuneFailReason mapping for every typed error + looksIntuneShaped boundary cases. * trust_anchor_holder_test.go (in internal/scep/intune): NewLoadsBundle, NewRequiresLogger, NewSurfacesLoadError, ReloadHappyPath, ReloadKeepsOldOnFailure, ReloadKeepsOldOnExpired (the fail-safe semantics that make the SIGHUP path operator-friendly), WatchSIGHUPReloadsPool (real SIGHUP to self with poll-for-swap pattern mirroring cmd/server/tls_test.go), WatchSIGHUPStopIsClean (does NOT fire SIGHUP after stop — same caveat as the TLS test: the Go runtime would otherwise terminate the test runner on the next SIGHUP since signal.Stop has removed the handler). * rate_limit_test.go (in internal/scep/intune): AllowsUpToCap, DistinctKeysIndependent, WindowExpiry, DisabledBypass (maxN=0), NegativeCapDisabled, EmptySubjectShortCircuits (defense-in-depth against an empty-subject DoS chokepoint), DefaultCapsHonored, MapCapEvictsOldest (at-cap eviction branch), ConcurrentRaceFree (50 goroutines × 200 inserts), pruneOlderThan + the no-op case. Verification: * gofmt -l on all touched files: clean * go vet ./... : clean * staticcheck on intune/service/config/cmd-server: clean * go test -count=1 -cover ./internal/scep/intune/...: 94.8% (target ≥85%) * go test -short across intune+service+config+handler+cmd-server: all green * G-3 docs-drift CI guard reproduced locally: docs-only filtered= empty, config-only=empty. The new env vars match the existing CERTCTL_SCEP_ allowlist prefix. Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 8 cowork/scep-rfc8894-intune/progress.md Constitutional rule: 'Always take the complete path, not the easy path' (cowork/CLAUDE.md::Operating Rules) — operator can flip CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true and observe the dispatcher pick up Intune-shaped challenges end-to-end with no further code changes. Foundation + plumbing ship together.
235 lines
7.4 KiB
Go
235 lines
7.4 KiB
Go
package intune
|
|
|
|
import (
|
|
"crypto/ecdsa"
|
|
"crypto/elliptic"
|
|
"crypto/rand"
|
|
"crypto/x509"
|
|
"crypto/x509/pkix"
|
|
"encoding/pem"
|
|
"io"
|
|
"log/slog"
|
|
"math/big"
|
|
"os"
|
|
"path/filepath"
|
|
"strings"
|
|
"syscall"
|
|
"testing"
|
|
"time"
|
|
)
|
|
|
|
// silentLogger returns a logger that drops everything; the SIGHUP watcher
|
|
// path emits Info logs we don't want fouling test output.
|
|
func silentTestLogger() *slog.Logger {
|
|
return slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{Level: slog.LevelError + 10}))
|
|
}
|
|
|
|
// writeTestBundle writes a PEM bundle of the given certs at path with mode 0600.
|
|
func writeTestBundle(t *testing.T, path string, certs []*x509.Certificate) {
|
|
t.Helper()
|
|
body := []byte{}
|
|
for _, c := range certs {
|
|
body = append(body, pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: c.Raw})...)
|
|
}
|
|
if err := os.WriteFile(path, body, 0o600); err != nil {
|
|
t.Fatalf("WriteFile: %v", err)
|
|
}
|
|
}
|
|
|
|
// freshHolderCert is a small factory for a self-signed EC cert with a
|
|
// caller-controlled CN + lifetime. Used by Reload tests that swap the
|
|
// on-disk pool between calls.
|
|
func freshHolderCert(t *testing.T, cn string, notAfter time.Time) *x509.Certificate {
|
|
t.Helper()
|
|
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
|
|
if err != nil {
|
|
t.Fatalf("ecdsa.GenerateKey: %v", err)
|
|
}
|
|
tmpl := &x509.Certificate{
|
|
SerialNumber: big.NewInt(time.Now().UnixNano()),
|
|
Subject: pkix.Name{CommonName: cn},
|
|
NotBefore: time.Now().Add(-1 * time.Hour),
|
|
NotAfter: notAfter,
|
|
}
|
|
der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, &key.PublicKey, key)
|
|
if err != nil {
|
|
t.Fatalf("x509.CreateCertificate: %v", err)
|
|
}
|
|
cert, err := x509.ParseCertificate(der)
|
|
if err != nil {
|
|
t.Fatalf("x509.ParseCertificate: %v", err)
|
|
}
|
|
return cert
|
|
}
|
|
|
|
func TestTrustAnchorHolder_NewLoadsBundle(t *testing.T) {
|
|
dir := t.TempDir()
|
|
path := filepath.Join(dir, "intune-trust.pem")
|
|
cert := freshHolderCert(t, "initial-conn", time.Now().Add(30*24*time.Hour))
|
|
writeTestBundle(t, path, []*x509.Certificate{cert})
|
|
|
|
holder, err := NewTrustAnchorHolder(path, silentTestLogger())
|
|
if err != nil {
|
|
t.Fatalf("NewTrustAnchorHolder: %v", err)
|
|
}
|
|
got := holder.Get()
|
|
if len(got) != 1 || got[0].Subject.CommonName != "initial-conn" {
|
|
t.Fatalf("Get returned %#v, want one cert with CN=initial-conn", got)
|
|
}
|
|
if holder.Path() != path {
|
|
t.Errorf("Path = %q, want %q", holder.Path(), path)
|
|
}
|
|
}
|
|
|
|
func TestTrustAnchorHolder_NewRequiresLogger(t *testing.T) {
|
|
if _, err := NewTrustAnchorHolder("/nonexistent", nil); err == nil {
|
|
t.Fatal("nil logger must error")
|
|
}
|
|
}
|
|
|
|
func TestTrustAnchorHolder_NewSurfacesLoadError(t *testing.T) {
|
|
if _, err := NewTrustAnchorHolder("/path/that/does/not/exist.pem", silentTestLogger()); err == nil {
|
|
t.Fatal("missing file must error")
|
|
}
|
|
}
|
|
|
|
func TestTrustAnchorHolder_ReloadHappyPath(t *testing.T) {
|
|
dir := t.TempDir()
|
|
path := filepath.Join(dir, "trust.pem")
|
|
c1 := freshHolderCert(t, "rev-1", time.Now().Add(30*24*time.Hour))
|
|
writeTestBundle(t, path, []*x509.Certificate{c1})
|
|
|
|
h, err := NewTrustAnchorHolder(path, silentTestLogger())
|
|
if err != nil {
|
|
t.Fatal(err)
|
|
}
|
|
|
|
// Rotate on disk and call Reload.
|
|
c2 := freshHolderCert(t, "rev-2", time.Now().Add(30*24*time.Hour))
|
|
writeTestBundle(t, path, []*x509.Certificate{c2})
|
|
if err := h.Reload(); err != nil {
|
|
t.Fatalf("Reload: %v", err)
|
|
}
|
|
got := h.Get()
|
|
if len(got) != 1 || got[0].Subject.CommonName != "rev-2" {
|
|
t.Errorf("after Reload Get = %#v, want one cert CN=rev-2", got)
|
|
}
|
|
}
|
|
|
|
func TestTrustAnchorHolder_ReloadKeepsOldOnFailure(t *testing.T) {
|
|
// Mid-rotation half-file: operator overwrites the bundle with garbage
|
|
// → Reload errors → holder must still serve the OLD pool. Without this
|
|
// fail-safe a single typo would take Intune enrollment down for the
|
|
// whole window until a re-rotate.
|
|
dir := t.TempDir()
|
|
path := filepath.Join(dir, "trust.pem")
|
|
good := freshHolderCert(t, "stable", time.Now().Add(30*24*time.Hour))
|
|
writeTestBundle(t, path, []*x509.Certificate{good})
|
|
|
|
h, err := NewTrustAnchorHolder(path, silentTestLogger())
|
|
if err != nil {
|
|
t.Fatal(err)
|
|
}
|
|
|
|
// Overwrite with content that LoadTrustAnchor will reject (no PEM blocks).
|
|
if err := os.WriteFile(path, []byte("garbage"), 0o600); err != nil {
|
|
t.Fatal(err)
|
|
}
|
|
if err := h.Reload(); err == nil {
|
|
t.Fatal("Reload from garbage file must error")
|
|
}
|
|
|
|
// Old pool still served.
|
|
got := h.Get()
|
|
if len(got) != 1 || got[0].Subject.CommonName != "stable" {
|
|
t.Errorf("after failed Reload Get should still be the pre-Reload pool; got %#v", got)
|
|
}
|
|
}
|
|
|
|
func TestTrustAnchorHolder_ReloadKeepsOldOnExpired(t *testing.T) {
|
|
dir := t.TempDir()
|
|
path := filepath.Join(dir, "trust.pem")
|
|
good := freshHolderCert(t, "still-valid", time.Now().Add(30*24*time.Hour))
|
|
writeTestBundle(t, path, []*x509.Certificate{good})
|
|
|
|
h, err := NewTrustAnchorHolder(path, silentTestLogger())
|
|
if err != nil {
|
|
t.Fatal(err)
|
|
}
|
|
|
|
// Operator rotates to a cert that's already expired (their script
|
|
// pulled an old bundle by mistake). Reload should error AND the holder
|
|
// should retain the previous good pool — exactly the fail-safe semantics
|
|
// LoadTrustAnchor enforces at startup.
|
|
expired := freshHolderCert(t, "expired-conn", time.Now().Add(-1*time.Hour))
|
|
writeTestBundle(t, path, []*x509.Certificate{expired})
|
|
|
|
if err := h.Reload(); err == nil {
|
|
t.Fatal("Reload with expired cert must error")
|
|
}
|
|
if !strings.Contains(h.Get()[0].Subject.CommonName, "still-valid") {
|
|
t.Errorf("after expired-cert Reload, holder should retain old pool")
|
|
}
|
|
}
|
|
|
|
func TestTrustAnchorHolder_WatchSIGHUPReloadsPool(t *testing.T) {
|
|
dir := t.TempDir()
|
|
path := filepath.Join(dir, "trust.pem")
|
|
c1 := freshHolderCert(t, "rev-pre-sighup", time.Now().Add(30*24*time.Hour))
|
|
writeTestBundle(t, path, []*x509.Certificate{c1})
|
|
|
|
h, err := NewTrustAnchorHolder(path, silentTestLogger())
|
|
if err != nil {
|
|
t.Fatal(err)
|
|
}
|
|
stop := h.WatchSIGHUP()
|
|
defer stop()
|
|
|
|
// Rotate on disk, then send SIGHUP to our own process and poll for the swap.
|
|
c2 := freshHolderCert(t, "rev-post-sighup", time.Now().Add(30*24*time.Hour))
|
|
writeTestBundle(t, path, []*x509.Certificate{c2})
|
|
if err := syscall.Kill(syscall.Getpid(), syscall.SIGHUP); err != nil {
|
|
t.Fatalf("send SIGHUP: %v", err)
|
|
}
|
|
|
|
// Poll for up to 2 seconds.
|
|
deadline := time.Now().Add(2 * time.Second)
|
|
for {
|
|
got := h.Get()
|
|
if len(got) == 1 && got[0].Subject.CommonName == "rev-post-sighup" {
|
|
return
|
|
}
|
|
if time.Now().After(deadline) {
|
|
t.Fatalf("post-SIGHUP pool not swapped in 2s; current CN=%q", got[0].Subject.CommonName)
|
|
}
|
|
time.Sleep(20 * time.Millisecond)
|
|
}
|
|
}
|
|
|
|
func TestTrustAnchorHolder_WatchSIGHUPStopIsClean(t *testing.T) {
|
|
// Mirrors cmd/server/tls_test.go::TestCertHolder_WatchSIGHUP_StopExits:
|
|
// we do NOT fire a SIGHUP after stop(), because once signal.Stop has
|
|
// removed our handler the kernel's default action on SIGHUP is to
|
|
// terminate the process — it would kill the test runner. The contract
|
|
// we need to pin is "stop() is synchronous and safe", which we
|
|
// demonstrate by closing the watcher and verifying the holder still
|
|
// serves the original cert without panic.
|
|
dir := t.TempDir()
|
|
path := filepath.Join(dir, "trust.pem")
|
|
writeTestBundle(t, path, []*x509.Certificate{
|
|
freshHolderCert(t, "stop-test", time.Now().Add(30*24*time.Hour)),
|
|
})
|
|
|
|
h, err := NewTrustAnchorHolder(path, silentTestLogger())
|
|
if err != nil {
|
|
t.Fatal(err)
|
|
}
|
|
stop := h.WatchSIGHUP()
|
|
stop()
|
|
time.Sleep(50 * time.Millisecond) // let the goroutine fully exit
|
|
|
|
if cn := h.Get()[0].Subject.CommonName; cn != "stop-test" {
|
|
t.Errorf("after stop CN = %q, want unchanged stop-test", cn)
|
|
}
|
|
}
|