mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:51:36 +00:00
fix(security): close BUNDLE 5 — auth, OIDC, MCP, API + browser security edges
Bundle 5 closure (2026-05-13 acquisition diligence audit). 13-finding
security audit pass across the auth / OIDC / MCP / API / browser-
security surface. Five real closures shipped in code, two false-as-
stated findings annotated with the existing implementation, three
operator-decision items documented for v3 follow-up, three doc-only
fixes (auth architecture narrative aligned with shipped OIDC).
Source findings closed (code):
S1 break-glass /auth/breakglass/login lacked the documented
5/min per-source-IP rate limit; handler now owns its own
SlidingWindowLimiter wired at startup. Doc claim turns true.
R6 OIDC test_discovery JWKS probe ran on http.DefaultClient;
now uses an http.Client whose transport wraps
validation.SafeHTTPDialContext. JWKS URI can no longer
pivot into reserved-address ranges via DNS rebinding.
R7 Slack + Teams notifiers built http.Client without the SSRF
dial-time guard. Both New() constructors now install
validation.SafeHTTPDialContext; webhook URLs (operator-
configured via dynamic-config GUI) cannot dial 169.254.x or
in-cluster reserved ranges. Test seam: newForTest bypasses
the guard for httptest's 127.0.0.1 binds, mirroring the
existing internal/connector/notifier/webhook pattern.
RT-L2 CERTCTL_ACME_INSECURE=true now emits a prominent
logger.Warn at server boot. Pre-Bundle-5 the knob silently
disabled ACME directory TLS verification.
Source findings closed (doc):
finding 1 + HIGH-5 Architecture doc claimed no in-process JWT/
OIDC/mTLS/SAML and pointed everyone at the
authenticating-gateway pattern. Auth Bundle 2
(commit dea5053) shipped native OIDC + sessions +
break-glass. New §"In-process authentication surface"
table (api-key / oidc / none) supersedes the old framing;
"Authenticating-gateway pattern (SAML, mTLS-as-auth,
LDAP)" section retained for protocols certctl still
doesn't ship natively.
Source findings verified false (existing implementation):
S4 OIDC email-domain allowlist — `email_domain_test.go`
already pins the strict-equality semantics (subdomain not
auto-accepted, multi-entry no-match path, empty allowlist
accepts all by-design per RFC 9700 §4.1.1).
SEC-L1 CSP / HSTS / referrer-policy headers — already shipped at
internal/api/middleware/securityheaders.go and wired at
cmd/server/main.go L2003+L2027+L2115.
Operator-decision / deferred (tracked in bundle-5 closure doc):
S3 CERTCTL_API_KEYS_NAMED parsing is wired, end-to-end
validation is partial. Operator decides: complete the
named-key middleware path or deprecate the syntax.
S5 Audit-middleware best-effort for read paths;
security-critical writes use WithinTx. Operator decides
per-path escalation.
S8 MCP threat model — the binary is a thin protocol bridge,
no privileges of its own; every tool call carries
CERTCTL_API_KEY and is auth'd + RBAC-gated server-side.
Optional CERTCTL_MCP_READ_ONLY gate tracked as v3.
SEC-H1 2026-05-10 audit CRIT-1/2/4 already closed on master;
CRIT-3/5 status against the spec folder is operator-
workstation-validation-only. Documented for follow-up.
SEC-L2 WebAuthn / FIDO2 / step-up — already documented in
docs/operator/auth-threat-model.md "Threats Bundle 2 does
NOT close". v3 work item per CLAUDE.md decision 12.
Full per-finding rationale + receipts at
docs/operator/security-bundle-5-audit-closure.md.
Verification:
gofmt -l # clean
go vet ./internal/connector/notifier/slack
./internal/connector/notifier/teams ./internal/auth/oidc
./internal/api/handler ./cmd/server # clean
go build ./cmd/server [...] # clean
go test -short -count=1 ./internal/connector/notifier/slack
./internal/connector/notifier/teams ./internal/api/handler
./internal/auth/oidc ./internal/config # PASS
# (slack 0.028s + teams
# 0.023s + handler 11.0s;
# newForTest seam keeps
# httptest tests green)
Audit-Closes: BUNDLE-5 S1 R6 R7 RT-L2 finding-1 HIGH-5
Audit-Verifies-False: S4 SEC-L1
Audit-Defers: S3 S5 S8 SEC-H1 SEC-L2
This commit is contained in:
@@ -32,6 +32,7 @@ import (
|
||||
"github.com/certctl-io/certctl/internal/auth/breakglass"
|
||||
bgdomain "github.com/certctl-io/certctl/internal/auth/breakglass/domain"
|
||||
sessiondomain "github.com/certctl-io/certctl/internal/auth/session/domain"
|
||||
"github.com/certctl-io/certctl/internal/ratelimit"
|
||||
)
|
||||
|
||||
// =============================================================================
|
||||
@@ -51,9 +52,30 @@ type BreakglassService interface {
|
||||
}
|
||||
|
||||
// AuthBreakglassHandler ships the Phase 7.5 surface.
|
||||
//
|
||||
// Bundle 5 closure (S1): the docstring at the top of this file claimed
|
||||
// the login endpoint was "Rate-limited at 5/minute per source IP via
|
||||
// the existing rate limiter middleware" but no per-route limiter was
|
||||
// wired — `/auth/breakglass/login` is registered via `r.mux.Handle`
|
||||
// in router.go::AuthExemptRouterRoutes and bypasses the global RPS
|
||||
// middleware that wraps `r.Register`-mounted routes. The login handler
|
||||
// now owns its own SlidingWindowLimiter (5 attempts / minute / source
|
||||
// IP, 50 000 key cap) so the documented behavior actually ships.
|
||||
//
|
||||
// Wired at startup via SetLoginRateLimiter (called from cmd/server/main.go
|
||||
// alongside the other per-handler rate limiters that close audit
|
||||
// findings H-9 / H-12 / Bundle 3 D7 / etc.). Defense-in-depth: even
|
||||
// when the limiter is nil (legacy / test), the service-layer Argon2id
|
||||
// lockout state machine still protects against brute force — but a
|
||||
// nil limiter is a misconfiguration the integration test catches.
|
||||
type AuthBreakglassHandler struct {
|
||||
svc BreakglassService
|
||||
cookieAttrs SessionCookieAttrs
|
||||
// loginLimiter rate-limits POST /auth/breakglass/login by source IP.
|
||||
// nil-safe: when unset, the handler skips the limiter check and
|
||||
// relies on the service-layer Argon2id lockout. Production deploys
|
||||
// MUST set this via SetLoginRateLimiter.
|
||||
loginLimiter *ratelimit.SlidingWindowLimiter
|
||||
}
|
||||
|
||||
// NewAuthBreakglassHandler constructs the handler.
|
||||
@@ -61,6 +83,13 @@ func NewAuthBreakglassHandler(svc BreakglassService, cookieAttrs SessionCookieAt
|
||||
return &AuthBreakglassHandler{svc: svc, cookieAttrs: cookieAttrs}
|
||||
}
|
||||
|
||||
// SetLoginRateLimiter wires the per-source-IP rate limiter the Login
|
||||
// handler enforces. Bundle 5 closure (S1) — see the AuthBreakglassHandler
|
||||
// type docstring for the full rationale.
|
||||
func (h *AuthBreakglassHandler) SetLoginRateLimiter(l *ratelimit.SlidingWindowLimiter) {
|
||||
h.loginLimiter = l
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// 1. Public login endpoint.
|
||||
// =============================================================================
|
||||
@@ -98,6 +127,22 @@ func (h *AuthBreakglassHandler) Login(w http.ResponseWriter, r *http.Request) {
|
||||
}
|
||||
|
||||
ip := clientIPFromRequest(r)
|
||||
|
||||
// Bundle 5 closure (S1): per-source-IP rate limit. 5 attempts /
|
||||
// minute / IP (default; configurable via the constructor at
|
||||
// cmd/server/main.go). Returns 429 with no body so the response
|
||||
// shape matches the rest of the auth surface (scanner-unfriendly).
|
||||
// Audited by the service layer on the next attempt — we don't
|
||||
// audit the rate-limit hit itself here because that would let an
|
||||
// attacker flood the audit table with rate-limit rows from a
|
||||
// single IP.
|
||||
if h.loginLimiter != nil {
|
||||
if err := h.loginLimiter.Allow(ip, time.Now()); err != nil {
|
||||
Error(w, http.StatusTooManyRequests, "too many requests")
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
res, err := h.svc.Authenticate(r.Context(), req.ActorID, req.Password, ip, r.UserAgent())
|
||||
if err != nil {
|
||||
// All authenticate errors map to the SAME 401 + same body.
|
||||
|
||||
@@ -9,10 +9,20 @@ import (
|
||||
"context"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"time"
|
||||
|
||||
gooidc "github.com/coreos/go-oidc/v3/oidc"
|
||||
|
||||
"github.com/certctl-io/certctl/internal/validation"
|
||||
)
|
||||
|
||||
// oidcOutboundTimeout bounds every test-discovery HTTP call (discovery
|
||||
// document fetch + JWKS reachability probe + userinfo if configured).
|
||||
// Shared by the SSRF-safe transport dialer (Bundle 5 R6 closure) and
|
||||
// the http.Client so the dial budget and the read/write budget land
|
||||
// on the same wall-clock horizon.
|
||||
const oidcOutboundTimeout = 10 * time.Second
|
||||
|
||||
// TestDiscoveryResult is the report TestDiscovery returns. The HTTP
|
||||
// layer marshals this verbatim. Each field is independently observable
|
||||
// so the GUI can render a per-check status row.
|
||||
@@ -134,12 +144,35 @@ func (s *Service) TestDiscovery(ctx context.Context, issuerURL string) (*TestDis
|
||||
// Kept distinct from go-oidc's internal JWKS fetcher because we want
|
||||
// to surface the HTTP status to the operator without requiring a
|
||||
// token-verify round-trip.
|
||||
//
|
||||
// Bundle 5 closure (audit R6): the GET runs through an SSRF-safe
|
||||
// http.Client whose transport's DialContext is wrapped in
|
||||
// validation.SafeHTTPDialContext. Pre-Bundle-5 the discovery probe
|
||||
// used http.DefaultClient and could be pointed at reserved-address
|
||||
// ranges via DNS rebinding (operator pastes a JWKS URI from the
|
||||
// dynamic-config GUI; admin RBAC for OIDC providers is sensitive but
|
||||
// not a system-wide super-admin gate). Now the dial-time guard re-
|
||||
// resolves the target host and rejects loopback / link-local /
|
||||
// private + cloud-metadata before any HTTP byte goes out. The
|
||||
// 10-second timeout matches the package-wide oidcOutboundTimeout
|
||||
// budget so token endpoint + JWKS + userinfo probes share the same
|
||||
// wall-clock horizon.
|
||||
var jwksReachable = func(ctx context.Context, jwksURI string) (bool, error) {
|
||||
req, err := http.NewRequestWithContext(ctx, http.MethodGet, jwksURI, nil)
|
||||
if err != nil {
|
||||
return false, err
|
||||
}
|
||||
resp, err := http.DefaultClient.Do(req)
|
||||
client := &http.Client{
|
||||
Timeout: oidcOutboundTimeout,
|
||||
Transport: &http.Transport{
|
||||
DialContext: validation.SafeHTTPDialContext(oidcOutboundTimeout),
|
||||
MaxIdleConns: 10,
|
||||
IdleConnTimeout: 90 * time.Second,
|
||||
TLSHandshakeTimeout: 10 * time.Second,
|
||||
ExpectContinueTimeout: 1 * time.Second,
|
||||
},
|
||||
}
|
||||
resp, err := client.Do(req)
|
||||
if err != nil {
|
||||
return false, err
|
||||
}
|
||||
|
||||
@@ -8,8 +8,16 @@ import (
|
||||
"io"
|
||||
"net/http"
|
||||
"time"
|
||||
|
||||
"github.com/certctl-io/certctl/internal/validation"
|
||||
)
|
||||
|
||||
// slackClientTimeout bounds every outbound Slack webhook request and
|
||||
// its resolution/dial phase. Shared by the transport dialer (SSRF
|
||||
// guard) and the http.Client so DNS rebinding and the read/write
|
||||
// budget land on the same time horizon.
|
||||
const slackClientTimeout = 10 * time.Second
|
||||
|
||||
// Config holds configuration for the Slack notifier.
|
||||
type Config struct {
|
||||
// WebhookURL is the Slack incoming webhook URL.
|
||||
@@ -29,11 +37,46 @@ type Notifier struct {
|
||||
}
|
||||
|
||||
// New creates a new Slack notifier.
|
||||
//
|
||||
// Bundle 5 closure (audit R7): the HTTP transport now wraps
|
||||
// validation.SafeHTTPDialContext so outbound webhook calls cannot be
|
||||
// pointed at reserved-address ranges (cloud metadata 169.254.169.254,
|
||||
// in-cluster ::1 / 127.0.0.1 / 10.0.0.0/8 / 172.16.0.0/12 /
|
||||
// 192.168.0.0/16, IPv6 link-local fe80::/10) via DNS rebinding or
|
||||
// operator-supplied raw IPs. Webhook URLs are operator-configured but
|
||||
// flow through the dynamic-config GUI (issuers + targets) which
|
||||
// untrusted-actor edits can reach with the right RBAC scope; without
|
||||
// the dial-time guard, a notifier config update would be an SSRF
|
||||
// pivot into instance metadata services. Mirrors the
|
||||
// internal/connector/notifier/webhook hardening pattern.
|
||||
func New(config Config) *Notifier {
|
||||
transport := &http.Transport{
|
||||
DialContext: validation.SafeHTTPDialContext(slackClientTimeout),
|
||||
MaxIdleConns: 10,
|
||||
IdleConnTimeout: 90 * time.Second,
|
||||
TLSHandshakeTimeout: 10 * time.Second,
|
||||
ExpectContinueTimeout: 1 * time.Second,
|
||||
}
|
||||
return &Notifier{
|
||||
config: config,
|
||||
httpClient: &http.Client{
|
||||
Timeout: 10 * time.Second,
|
||||
Timeout: slackClientTimeout,
|
||||
Transport: transport,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// newForTest is the test-only constructor that bypasses the
|
||||
// SafeHTTPDialContext guard so unit tests using httptest.NewServer
|
||||
// (which binds to 127.0.0.1) can exercise the rest of the notifier
|
||||
// path. The exported `New` is the only production constructor and
|
||||
// installs the dial-time SSRF guard unconditionally. Mirrors the
|
||||
// internal/connector/notifier/webhook seam (newForTest there).
|
||||
func newForTest(config Config) *Notifier {
|
||||
return &Notifier{
|
||||
config: config,
|
||||
httpClient: &http.Client{
|
||||
Timeout: slackClientTimeout,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
@@ -34,7 +34,11 @@ func TestSlack_SendSuccess(t *testing.T) {
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
n := New(Config{WebhookURL: server.URL})
|
||||
// Bundle 5 closure (R7): production `New` installs the SSRF dial-time
|
||||
// guard which refuses httptest.NewServer's 127.0.0.1 bind. The
|
||||
// unexported `newForTest` constructor bypasses the guard for unit
|
||||
// tests that exercise the rest of the notifier path.
|
||||
n := newForTest(Config{WebhookURL: server.URL})
|
||||
err := n.Send(context.Background(), "ops@example.com", "Cert Expiring", "mc-api-prod expires in 7 days")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
@@ -57,7 +61,7 @@ func TestSlack_SendWithOverrides(t *testing.T) {
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
n := New(Config{
|
||||
n := newForTest(Config{
|
||||
WebhookURL: server.URL,
|
||||
ChannelOverride: "#alerts",
|
||||
Username: "certctl-bot",
|
||||
@@ -86,7 +90,7 @@ func TestSlack_SendHTTPError(t *testing.T) {
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
n := New(Config{WebhookURL: server.URL})
|
||||
n := newForTest(Config{WebhookURL: server.URL})
|
||||
err := n.Send(context.Background(), "", "Test", "body")
|
||||
if err == nil {
|
||||
t.Fatal("expected error, got nil")
|
||||
@@ -97,7 +101,7 @@ func TestSlack_SendHTTPError(t *testing.T) {
|
||||
}
|
||||
|
||||
func TestSlack_SendConnectionError(t *testing.T) {
|
||||
n := New(Config{WebhookURL: "http://127.0.0.1:1"})
|
||||
n := newForTest(Config{WebhookURL: "http://127.0.0.1:1"})
|
||||
err := n.Send(context.Background(), "", "Test", "body")
|
||||
if err == nil {
|
||||
t.Fatal("expected connection error, got nil")
|
||||
|
||||
@@ -8,8 +8,15 @@ import (
|
||||
"io"
|
||||
"net/http"
|
||||
"time"
|
||||
|
||||
"github.com/certctl-io/certctl/internal/validation"
|
||||
)
|
||||
|
||||
// teamsClientTimeout bounds every outbound Teams webhook request and
|
||||
// its resolution/dial phase. Shared by the SSRF-safe transport dialer
|
||||
// (Bundle 5 R7 closure) and the http.Client.
|
||||
const teamsClientTimeout = 10 * time.Second
|
||||
|
||||
// Config holds configuration for the Microsoft Teams notifier.
|
||||
type Config struct {
|
||||
// WebhookURL is the Teams incoming webhook URL.
|
||||
@@ -23,11 +30,35 @@ type Notifier struct {
|
||||
}
|
||||
|
||||
// New creates a new Teams notifier.
|
||||
//
|
||||
// Bundle 5 closure (audit R7): SSRF-safe transport — see the parallel
|
||||
// rationale in internal/connector/notifier/slack.New. Webhook URLs are
|
||||
// operator-configured via the dynamic-config GUI and must not pivot
|
||||
// into cloud metadata services or in-cluster reserved ranges.
|
||||
func New(config Config) *Notifier {
|
||||
transport := &http.Transport{
|
||||
DialContext: validation.SafeHTTPDialContext(teamsClientTimeout),
|
||||
MaxIdleConns: 10,
|
||||
IdleConnTimeout: 90 * time.Second,
|
||||
TLSHandshakeTimeout: 10 * time.Second,
|
||||
ExpectContinueTimeout: 1 * time.Second,
|
||||
}
|
||||
return &Notifier{
|
||||
config: config,
|
||||
httpClient: &http.Client{
|
||||
Timeout: 10 * time.Second,
|
||||
Timeout: teamsClientTimeout,
|
||||
Transport: transport,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// newForTest bypasses the SSRF dial-time guard for unit tests that hit
|
||||
// httptest.NewServer (binds to 127.0.0.1). Production uses `New`.
|
||||
func newForTest(config Config) *Notifier {
|
||||
return &Notifier{
|
||||
config: config,
|
||||
httpClient: &http.Client{
|
||||
Timeout: teamsClientTimeout,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
@@ -34,7 +34,8 @@ func TestTeams_SendSuccess(t *testing.T) {
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
n := New(Config{WebhookURL: server.URL})
|
||||
// Bundle 5 R7: bypass the SSRF dial-time guard for httptest's 127.0.0.1 server.
|
||||
n := newForTest(Config{WebhookURL: server.URL})
|
||||
err := n.Send(context.Background(), "team@example.com", "Renewal Failed", "Certificate mc-api-prod renewal failed after 3 attempts")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
@@ -70,7 +71,7 @@ func TestTeams_SendHTTPError(t *testing.T) {
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
n := New(Config{WebhookURL: server.URL})
|
||||
n := newForTest(Config{WebhookURL: server.URL})
|
||||
err := n.Send(context.Background(), "", "Test", "body")
|
||||
if err == nil {
|
||||
t.Fatal("expected error, got nil")
|
||||
@@ -81,7 +82,7 @@ func TestTeams_SendHTTPError(t *testing.T) {
|
||||
}
|
||||
|
||||
func TestTeams_SendConnectionError(t *testing.T) {
|
||||
n := New(Config{WebhookURL: "http://127.0.0.1:1"})
|
||||
n := newForTest(Config{WebhookURL: "http://127.0.0.1:1"})
|
||||
err := n.Send(context.Background(), "", "Test", "body")
|
||||
if err == nil {
|
||||
t.Fatal("expected connection error, got nil")
|
||||
|
||||
Reference in New Issue
Block a user