security(email): sanitize body fields against content injection (CodeQL #11, CWE-640)

CodeQL alert #11 (go/email-injection, CWE-640 / OWASP Content Spoofing) flagged the wc.Write(message) sink at internal/connector/notifier/email/ email.go:208 because attacker-controllable fields flow into the email body unchecked. Threat model: Headers (From, To, Subject) were already protected by validation.ValidateHeaderValue (CWE-113 SMTP header injection, closed in commit 3853b74). The remaining gap was the body. An attacker controls multiple fields that surface to the body of alert/event notifications: - alert.Subject, alert.Message - event.Subject, event.Body, *event.CertificateID - alert.Metadata + event.Metadata key/value pairs These can carry CR/LF (forged 'Reply-To: attacker@evil.com' inside the body that recipients skim), NUL bytes (RFC 5321 4.5.2 violation that some MTAs truncate at), bidi-override Unicode (visually- spoofable URLs), zero-width / invisible Unicode (phishing), or malformed UTF-8 (Go emits U+FFFD which becomes a glyph in mail clients). The HTML email path (digest service) already uses html/template upstream and is safe via contextual auto-escape. This commit closes the plaintext path. Fix: internal/validation/headers.go gains SanitizeEmailBodyValue — a sanitizer that NEVER errors (the right contract for body content; over-eager rejection drops operator notifications) and scrubs: - NUL bytes (stripped entirely) - bare CR / LF (replaced with space — single fields should never carry their own line breaks; the surrounding template handles legitimate CRLFs) - C0 control chars < 0x20 except TAB - DEL (0x7F) + C1 control chars (0x80-0x9F) - U+FFFD (defense in depth: malformed UTF-8 -> Go emits this; strip so attacker-planted invalid bytes don't survive as an arbitrary glyph) - Bidi-override Unicode (U+202A..U+202E, U+2066..U+2069) - Zero-width / invisible Unicode (U+200B..U+200D, U+2060..U+2063, U+FEFF, U+180E) - Catch-all unicode.IsControl for anything not enumerated above Codepoint table uses numeric ranges rather than rune-literal switch cases — Go source rejects literal invisible characters (BOM U+FEFF) mid-file, so the table compares against numeric values. internal/connector/notifier/email/email.go applies the sanitizer at every interpolation site: - formatAlertBody: alert.ID/Type/Severity/Subject/Message (CreatedAt is time.Time -> RFC3339, structural, not sanitized) - formatEventBody: event.ID/Type/Subject/Body, *CertificateID (CreatedAt structural, not sanitized) - formatMetadata: both keys and values The sendEmail / formatEmailMessage call sites continue to validate headers (From / To / Subject) via the existing ValidateHeaderValue fail-closed gate; the new sanitizer is body-side only. Tests (internal/validation/headers_test.go): TestSanitizeEmailBodyValue_PreservesSafeInput Pin: ordinary ASCII, UTF-8 multibyte (résumé / 日本語 / مرحبا), tabs, common cert DNs, URLs all flow through unchanged. TestSanitizeEmailBodyValue_StripsControlChars Table-driven across NUL, bare LF/CR, CRLF, BEL, backspace, DEL, C1 (U+0080 / U+009F), U+FFFD, TAB-preserve. TestSanitizeEmailBodyValue_StripsBidiOverride 7 attacker payloads (RLO, LRO, LRI, zero-width space, ZWNJ, BOM, MVS) — each must produce a non-identity output. TestSanitizeEmailBodyValue_ContentSpoofingScenario The CodeQL example case: 'alert\r\nReply-To: attacker@evil.com\r\n Click https://evil.example.com/reset' — verify NO CR/LF survives. Verified locally: gofmt: clean. go vet ./...: exit 0. go test -short -count=1 ./internal/validation/...: ok 0.374s go test -short -count=1 ./internal/connector/notifier/email/...: ok 0.186s Reference: https://github.com/certctl-io/certctl/security/code-scanning/11 Closes CodeQL alert #11 (go/email-injection).
2026-06-07 15:01:32 +00:00 · 2026-05-04 04:56:13 +00:00
parent e50ba168ac
commit 23c593089d
3 changed files with 297 additions and 4 deletions
@@ -68,3 +68,127 @@ func TestValidateHeaderValue_DefaultFieldName(t *testing.T) {
 		t.Errorf("expected default field name 'header' in error, got %q", err.Error())
 	}
 }
+
+// TestSanitizeEmailBodyValue_PreservesSafeInput pins the contract that
+// ordinary body content (including non-Latin scripts and tabs) flows
+// through unchanged. The sanitizer must be a no-op for legitimate input
+// — over-stripping degrades operator notifications.
+func TestSanitizeEmailBodyValue_PreservesSafeInput(t *testing.T) {
+	tests := []struct {
+		name  string
+		input string
+	}{
+		{"plain ASCII", "Renewal reminder for prod.example.com"},
+		{"empty", ""},
+		{"utf-8 multibyte", "résumé — 日本語 — مرحبا"},
+		{"tabs allowed", "key:\tvalue"},
+		{"spaces", "  multiple  spaces  "},
+		{"common cert DN", "CN=api.example.com,O=Acme Corp,C=US"},
+		{"URL with safe chars", "https://docs.example.com/cert/mc-prod-api"},
+	}
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			got := SanitizeEmailBodyValue(tc.input)
+			if got != tc.input {
+				t.Errorf("expected unchanged %q, got %q", tc.input, got)
+			}
+		})
+	}
+}
+
+// TestSanitizeEmailBodyValue_StripsControlChars pins the CodeQL
+// go/email-injection (CWE-640) defense — every attacker-plant-able
+// control character is stripped or replaced.
+func TestSanitizeEmailBodyValue_StripsControlChars(t *testing.T) {
+	tests := []struct {
+		name      string
+		input     string
+		want      string
+		wantSafer bool // want output != input
+	}{
+		{"NUL byte stripped", "before\x00after", "beforeafter", true},
+		{"bare LF replaced with space", "line1\nline2", "line1 line2", true},
+		{"bare CR replaced with space", "line1\rline2", "line1 line2", true},
+		{"CRLF replaced (both stripped)", "line1\r\nline2", "line1  line2", true},
+		{"BEL stripped", "alert\x07now", "alertnow", true},
+		{"backspace stripped", "x\x08y", "xy", true},
+		{"DEL stripped", "x\x7fy", "xy", true},
+		// C1 control chars must be specified via Unicode escape (\u) so
+		// the source remains valid UTF-8; bare \x80 / \x9f bytes would
+		// be invalid UTF-8 and Go's range emits U+FFFD instead, which
+		// would test the malformed-UTF-8 strip path, not the C1 path.
+		{"C1 control char stripped (U+0080)", "x\u0080y", "xy", true},
+		{"C1 control char stripped (U+009F)", "x\u009Fy", "xy", true},
+		// U+FFFD is the replacement char Go emits for malformed UTF-8.
+		// We strip it as defense-in-depth so attacker-planted invalid
+		// UTF-8 doesn't survive into operator notifications as an
+		// arbitrary glyph.
+		{"replacement char stripped", "x\uFFFDy", "xy", true},
+		{"TAB preserved (legitimate body content)", "k:\tv", "k:\tv", false},
+	}
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			got := SanitizeEmailBodyValue(tc.input)
+			if got != tc.want {
+				t.Errorf("input %q: want %q, got %q", tc.input, tc.want, got)
+			}
+			if tc.wantSafer && got == tc.input {
+				t.Errorf("expected sanitization to change %q, but output unchanged", tc.input)
+			}
+		})
+	}
+}
+
+// TestSanitizeEmailBodyValue_StripsBidiOverride pins the
+// visually-spoofable Unicode defense (homograph / RTL-override /
+// zero-width attacks). An attacker who controls a CN or metadata value
+// could otherwise plant a malicious URL that renders benignly in mail
+// clients that honor bidi-override codepoints.
+func TestSanitizeEmailBodyValue_StripsBidiOverride(t *testing.T) {
+	tests := []struct {
+		name  string
+		input string
+	}{
+		// U+202E = Right-to-left override
+		{"RTL override", "Click ‮www.evil.com‬ to verify"},
+		// U+202D = Left-to-right override
+		{"LRO override", "Click ‭www.evil.com‬ to verify"},
+		// U+2066 = Left-to-right isolate
+		{"LRI isolate", "Click ⁦www.evil.com⁩ to verify"},
+		// U+200B = Zero-width space
+		{"zero-width space", "evil.example.com"},
+		// U+200C = ZWNJ
+		{"zero-width non-joiner", "ad‌min@example.com"},
+		// U+FEFF = byte-order mark / zero-width no-break space
+		{"BOM", "x\uFEFFy"},
+		// U+180E = Mongolian Vowel Separator
+		{"MVS", "a᠎b"},
+	}
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			got := SanitizeEmailBodyValue(tc.input)
+			if got == tc.input {
+				t.Errorf("expected bidi/zero-width stripping for %q, got unchanged %q", tc.input, got)
+			}
+		})
+	}
+}
+
+// TestSanitizeEmailBodyValue_ContentSpoofingScenario pins the specific
+// CodeQL go/email-injection (CWE-640) example: an attacker who controls
+// a body field plants header-like content. The sanitizer neutralizes
+// the attempt by stripping bare LF/CR within the field.
+func TestSanitizeEmailBodyValue_ContentSpoofingScenario(t *testing.T) {
+	// Attacker plants a body value that tries to fake a "Reply-To"
+	// header inside the body. Even if mail clients don't honor it, a
+	// recipient skimming the body could be fooled.
+	attacker := "alert from compromised cert\r\nReply-To: attacker@evil.com\r\nClick https://evil.example.com/reset"
+	got := SanitizeEmailBodyValue(attacker)
+	if got == attacker {
+		t.Fatalf("attacker input passed through unchanged: %q", got)
+	}
+	// Specifically: no CR or LF should remain in the field.
+	if strings.ContainsAny(got, "\r\n") {
+		t.Errorf("CR/LF still present after sanitization: %q", got)
+	}
+}