fix(security): close BUNDLE 5 — auth, OIDC, MCP, API + browser security edges

Bundle 5 closure (2026-05-13 acquisition diligence audit). 13-finding
security audit pass across the auth / OIDC / MCP / API / browser-
security surface. Five real closures shipped in code, two false-as-
stated findings annotated with the existing implementation, three
operator-decision items documented for v3 follow-up, three doc-only
fixes (auth architecture narrative aligned with shipped OIDC).

Source findings closed (code):
  S1     break-glass /auth/breakglass/login lacked the documented
         5/min per-source-IP rate limit; handler now owns its own
         SlidingWindowLimiter wired at startup. Doc claim turns true.
  R6     OIDC test_discovery JWKS probe ran on http.DefaultClient;
         now uses an http.Client whose transport wraps
         validation.SafeHTTPDialContext. JWKS URI can no longer
         pivot into reserved-address ranges via DNS rebinding.
  R7     Slack + Teams notifiers built http.Client without the SSRF
         dial-time guard. Both New() constructors now install
         validation.SafeHTTPDialContext; webhook URLs (operator-
         configured via dynamic-config GUI) cannot dial 169.254.x or
         in-cluster reserved ranges. Test seam: newForTest bypasses
         the guard for httptest's 127.0.0.1 binds, mirroring the
         existing internal/connector/notifier/webhook pattern.
  RT-L2  CERTCTL_ACME_INSECURE=true now emits a prominent
         logger.Warn at server boot. Pre-Bundle-5 the knob silently
         disabled ACME directory TLS verification.

Source findings closed (doc):
  finding 1 + HIGH-5  Architecture doc claimed no in-process JWT/
         OIDC/mTLS/SAML and pointed everyone at the
         authenticating-gateway pattern. Auth Bundle 2
         (commit dea5053) shipped native OIDC + sessions +
         break-glass. New §"In-process authentication surface"
         table (api-key / oidc / none) supersedes the old framing;
         "Authenticating-gateway pattern (SAML, mTLS-as-auth,
         LDAP)" section retained for protocols certctl still
         doesn't ship natively.

Source findings verified false (existing implementation):
  S4     OIDC email-domain allowlist — `email_domain_test.go`
         already pins the strict-equality semantics (subdomain not
         auto-accepted, multi-entry no-match path, empty allowlist
         accepts all by-design per RFC 9700 §4.1.1).
  SEC-L1 CSP / HSTS / referrer-policy headers — already shipped at
         internal/api/middleware/securityheaders.go and wired at
         cmd/server/main.go L2003+L2027+L2115.

Operator-decision / deferred (tracked in bundle-5 closure doc):
  S3     CERTCTL_API_KEYS_NAMED parsing is wired, end-to-end
         validation is partial. Operator decides: complete the
         named-key middleware path or deprecate the syntax.
  S5     Audit-middleware best-effort for read paths;
         security-critical writes use WithinTx. Operator decides
         per-path escalation.
  S8     MCP threat model — the binary is a thin protocol bridge,
         no privileges of its own; every tool call carries
         CERTCTL_API_KEY and is auth'd + RBAC-gated server-side.
         Optional CERTCTL_MCP_READ_ONLY gate tracked as v3.
  SEC-H1 2026-05-10 audit CRIT-1/2/4 already closed on master;
         CRIT-3/5 status against the spec folder is operator-
         workstation-validation-only. Documented for follow-up.
  SEC-L2 WebAuthn / FIDO2 / step-up — already documented in
         docs/operator/auth-threat-model.md "Threats Bundle 2 does
         NOT close". v3 work item per CLAUDE.md decision 12.

Full per-finding rationale + receipts at
docs/operator/security-bundle-5-audit-closure.md.

Verification:
  gofmt -l                                                # clean
  go vet ./internal/connector/notifier/slack
    ./internal/connector/notifier/teams ./internal/auth/oidc
    ./internal/api/handler ./cmd/server                  # clean
  go build ./cmd/server [...]                            # clean
  go test -short -count=1 ./internal/connector/notifier/slack
    ./internal/connector/notifier/teams ./internal/api/handler
    ./internal/auth/oidc ./internal/config                # PASS
                                                          # (slack 0.028s + teams
                                                          # 0.023s + handler 11.0s;
                                                          # newForTest seam keeps
                                                          # httptest tests green)

Audit-Closes: BUNDLE-5 S1 R6 R7 RT-L2 finding-1 HIGH-5
Audit-Verifies-False: S4 SEC-L1
Audit-Defers: S3 S5 S8 SEC-H1 SEC-L2
This commit is contained in:
shankar0123
2026-05-13 01:18:45 +00:00
parent 750478a6fe
commit 596e675ec7
9 changed files with 265 additions and 14 deletions
+44 -1
View File
@@ -8,8 +8,16 @@ import (
"io"
"net/http"
"time"
"github.com/certctl-io/certctl/internal/validation"
)
// slackClientTimeout bounds every outbound Slack webhook request and
// its resolution/dial phase. Shared by the transport dialer (SSRF
// guard) and the http.Client so DNS rebinding and the read/write
// budget land on the same time horizon.
const slackClientTimeout = 10 * time.Second
// Config holds configuration for the Slack notifier.
type Config struct {
// WebhookURL is the Slack incoming webhook URL.
@@ -29,11 +37,46 @@ type Notifier struct {
}
// New creates a new Slack notifier.
//
// Bundle 5 closure (audit R7): the HTTP transport now wraps
// validation.SafeHTTPDialContext so outbound webhook calls cannot be
// pointed at reserved-address ranges (cloud metadata 169.254.169.254,
// in-cluster ::1 / 127.0.0.1 / 10.0.0.0/8 / 172.16.0.0/12 /
// 192.168.0.0/16, IPv6 link-local fe80::/10) via DNS rebinding or
// operator-supplied raw IPs. Webhook URLs are operator-configured but
// flow through the dynamic-config GUI (issuers + targets) which
// untrusted-actor edits can reach with the right RBAC scope; without
// the dial-time guard, a notifier config update would be an SSRF
// pivot into instance metadata services. Mirrors the
// internal/connector/notifier/webhook hardening pattern.
func New(config Config) *Notifier {
transport := &http.Transport{
DialContext: validation.SafeHTTPDialContext(slackClientTimeout),
MaxIdleConns: 10,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
return &Notifier{
config: config,
httpClient: &http.Client{
Timeout: 10 * time.Second,
Timeout: slackClientTimeout,
Transport: transport,
},
}
}
// newForTest is the test-only constructor that bypasses the
// SafeHTTPDialContext guard so unit tests using httptest.NewServer
// (which binds to 127.0.0.1) can exercise the rest of the notifier
// path. The exported `New` is the only production constructor and
// installs the dial-time SSRF guard unconditionally. Mirrors the
// internal/connector/notifier/webhook seam (newForTest there).
func newForTest(config Config) *Notifier {
return &Notifier{
config: config,
httpClient: &http.Client{
Timeout: slackClientTimeout,
},
}
}
@@ -34,7 +34,11 @@ func TestSlack_SendSuccess(t *testing.T) {
}))
defer server.Close()
n := New(Config{WebhookURL: server.URL})
// Bundle 5 closure (R7): production `New` installs the SSRF dial-time
// guard which refuses httptest.NewServer's 127.0.0.1 bind. The
// unexported `newForTest` constructor bypasses the guard for unit
// tests that exercise the rest of the notifier path.
n := newForTest(Config{WebhookURL: server.URL})
err := n.Send(context.Background(), "ops@example.com", "Cert Expiring", "mc-api-prod expires in 7 days")
if err != nil {
t.Fatalf("unexpected error: %v", err)
@@ -57,7 +61,7 @@ func TestSlack_SendWithOverrides(t *testing.T) {
}))
defer server.Close()
n := New(Config{
n := newForTest(Config{
WebhookURL: server.URL,
ChannelOverride: "#alerts",
Username: "certctl-bot",
@@ -86,7 +90,7 @@ func TestSlack_SendHTTPError(t *testing.T) {
}))
defer server.Close()
n := New(Config{WebhookURL: server.URL})
n := newForTest(Config{WebhookURL: server.URL})
err := n.Send(context.Background(), "", "Test", "body")
if err == nil {
t.Fatal("expected error, got nil")
@@ -97,7 +101,7 @@ func TestSlack_SendHTTPError(t *testing.T) {
}
func TestSlack_SendConnectionError(t *testing.T) {
n := New(Config{WebhookURL: "http://127.0.0.1:1"})
n := newForTest(Config{WebhookURL: "http://127.0.0.1:1"})
err := n.Send(context.Background(), "", "Test", "body")
if err == nil {
t.Fatal("expected connection error, got nil")
+32 -1
View File
@@ -8,8 +8,15 @@ import (
"io"
"net/http"
"time"
"github.com/certctl-io/certctl/internal/validation"
)
// teamsClientTimeout bounds every outbound Teams webhook request and
// its resolution/dial phase. Shared by the SSRF-safe transport dialer
// (Bundle 5 R7 closure) and the http.Client.
const teamsClientTimeout = 10 * time.Second
// Config holds configuration for the Microsoft Teams notifier.
type Config struct {
// WebhookURL is the Teams incoming webhook URL.
@@ -23,11 +30,35 @@ type Notifier struct {
}
// New creates a new Teams notifier.
//
// Bundle 5 closure (audit R7): SSRF-safe transport — see the parallel
// rationale in internal/connector/notifier/slack.New. Webhook URLs are
// operator-configured via the dynamic-config GUI and must not pivot
// into cloud metadata services or in-cluster reserved ranges.
func New(config Config) *Notifier {
transport := &http.Transport{
DialContext: validation.SafeHTTPDialContext(teamsClientTimeout),
MaxIdleConns: 10,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
return &Notifier{
config: config,
httpClient: &http.Client{
Timeout: 10 * time.Second,
Timeout: teamsClientTimeout,
Transport: transport,
},
}
}
// newForTest bypasses the SSRF dial-time guard for unit tests that hit
// httptest.NewServer (binds to 127.0.0.1). Production uses `New`.
func newForTest(config Config) *Notifier {
return &Notifier{
config: config,
httpClient: &http.Client{
Timeout: teamsClientTimeout,
},
}
}
@@ -34,7 +34,8 @@ func TestTeams_SendSuccess(t *testing.T) {
}))
defer server.Close()
n := New(Config{WebhookURL: server.URL})
// Bundle 5 R7: bypass the SSRF dial-time guard for httptest's 127.0.0.1 server.
n := newForTest(Config{WebhookURL: server.URL})
err := n.Send(context.Background(), "team@example.com", "Renewal Failed", "Certificate mc-api-prod renewal failed after 3 attempts")
if err != nil {
t.Fatalf("unexpected error: %v", err)
@@ -70,7 +71,7 @@ func TestTeams_SendHTTPError(t *testing.T) {
}))
defer server.Close()
n := New(Config{WebhookURL: server.URL})
n := newForTest(Config{WebhookURL: server.URL})
err := n.Send(context.Background(), "", "Test", "body")
if err == nil {
t.Fatal("expected error, got nil")
@@ -81,7 +82,7 @@ func TestTeams_SendHTTPError(t *testing.T) {
}
func TestTeams_SendConnectionError(t *testing.T) {
n := New(Config{WebhookURL: "http://127.0.0.1:1"})
n := newForTest(Config{WebhookURL: "http://127.0.0.1:1"})
err := n.Send(context.Background(), "", "Test", "body")
if err == nil {
t.Fatal("expected connection error, got nil")