security(email): sanitize body fields against content injection (CodeQL #11, CWE-640)

CodeQL alert #11 (go/email-injection, CWE-640 / OWASP Content Spoofing)
flagged the wc.Write(message) sink at internal/connector/notifier/email/
email.go:208 because attacker-controllable fields flow into the email
body unchecked.

Threat model:
  Headers (From, To, Subject) were already protected by
  validation.ValidateHeaderValue (CWE-113 SMTP header injection,
  closed in commit 3853b74). The remaining gap was the body.
  An attacker controls multiple fields that surface to the body of
  alert/event notifications:
    - alert.Subject, alert.Message
    - event.Subject, event.Body, *event.CertificateID
    - alert.Metadata + event.Metadata key/value pairs
  These can carry CR/LF (forged 'Reply-To: attacker@evil.com' inside
  the body that recipients skim), NUL bytes (RFC 5321 4.5.2 violation
  that some MTAs truncate at), bidi-override Unicode (visually-
  spoofable URLs), zero-width / invisible Unicode (phishing), or
  malformed UTF-8 (Go emits U+FFFD which becomes a glyph in mail
  clients).

  The HTML email path (digest service) already uses html/template
  upstream and is safe via contextual auto-escape. This commit
  closes the plaintext path.

Fix:
  internal/validation/headers.go gains SanitizeEmailBodyValue —
  a sanitizer that NEVER errors (the right contract for body
  content; over-eager rejection drops operator notifications) and
  scrubs:
    - NUL bytes (stripped entirely)
    - bare CR / LF (replaced with space — single fields should never
      carry their own line breaks; the surrounding template handles
      legitimate CRLFs)
    - C0 control chars < 0x20 except TAB
    - DEL (0x7F) + C1 control chars (0x80-0x9F)
    - U+FFFD (defense in depth: malformed UTF-8 -> Go emits this;
      strip so attacker-planted invalid bytes don't survive as an
      arbitrary glyph)
    - Bidi-override Unicode (U+202A..U+202E, U+2066..U+2069)
    - Zero-width / invisible Unicode (U+200B..U+200D, U+2060..U+2063,
      U+FEFF, U+180E)
    - Catch-all unicode.IsControl for anything not enumerated above
  Codepoint table uses numeric ranges rather than rune-literal switch
  cases — Go source rejects literal invisible characters (BOM U+FEFF)
  mid-file, so the table compares against numeric values.

  internal/connector/notifier/email/email.go applies the sanitizer
  at every interpolation site:
    - formatAlertBody: alert.ID/Type/Severity/Subject/Message
      (CreatedAt is time.Time -> RFC3339, structural, not sanitized)
    - formatEventBody: event.ID/Type/Subject/Body, *CertificateID
      (CreatedAt structural, not sanitized)
    - formatMetadata: both keys and values
  The sendEmail / formatEmailMessage call sites continue to validate
  headers (From / To / Subject) via the existing ValidateHeaderValue
  fail-closed gate; the new sanitizer is body-side only.

Tests (internal/validation/headers_test.go):
  TestSanitizeEmailBodyValue_PreservesSafeInput
    Pin: ordinary ASCII, UTF-8 multibyte (résumé / 日本語 / مرحبا),
    tabs, common cert DNs, URLs all flow through unchanged.
  TestSanitizeEmailBodyValue_StripsControlChars
    Table-driven across NUL, bare LF/CR, CRLF, BEL, backspace, DEL,
    C1 (U+0080 / U+009F), U+FFFD, TAB-preserve.
  TestSanitizeEmailBodyValue_StripsBidiOverride
    7 attacker payloads (RLO, LRO, LRI, zero-width space, ZWNJ, BOM,
    MVS) — each must produce a non-identity output.
  TestSanitizeEmailBodyValue_ContentSpoofingScenario
    The CodeQL example case: 'alert\r\nReply-To: attacker@evil.com\r\n
    Click https://evil.example.com/reset' — verify NO CR/LF survives.

Verified locally:
  gofmt: clean.
  go vet ./...: exit 0.
  go test -short -count=1 ./internal/validation/...: ok 0.374s
  go test -short -count=1 ./internal/connector/notifier/email/...: ok 0.186s

Reference: https://github.com/certctl-io/certctl/security/code-scanning/11
Closes CodeQL alert #11 (go/email-injection).
This commit is contained in:
shankar0123
2026-05-04 04:56:13 +00:00
parent e50ba168ac
commit 23c593089d
3 changed files with 297 additions and 4 deletions
+50 -4
View File
@@ -356,6 +356,21 @@ func (c *Connector) formatHTMLEmailMessage(from, to, subject, htmlBody string) (
}
// formatAlertBody formats an alert notification as email body text.
//
// CodeQL go/email-injection (CWE-640 / OWASP Content Spoofing) defense:
// every field interpolated into the body that may carry attacker-
// controlled content (alert.Subject, alert.Message, alert.Metadata
// values, alert.ID / Type / Severity which originate from the API
// surface) is routed through validation.SanitizeEmailBodyValue before
// formatting. The sanitizer strips NUL bytes (RFC 5321 §4.5.2 violation),
// bare CR/LF within a single field (forged header-boundary attempts),
// bidi-override Unicode (visually-spoofable URLs), zero-width / invisible
// codepoints, and C0/C1 control chars. CreatedAt is a time.Time —
// formatted via RFC3339; not user-controllable so unsanitized.
//
// Header values (From, To, Subject) are protected separately by
// validation.ValidateHeaderValue at sendEmail entry (CWE-113 SMTP header
// injection — see commit 9e957c3).
func (c *Connector) formatAlertBody(alert notifier.Alert) string {
body := fmt.Sprintf(`
Certificate Alert Notification
@@ -372,16 +387,29 @@ Message:
%s
%s
`, alert.ID, alert.Type, alert.Severity, alert.CreatedAt.Format(time.RFC3339), alert.Subject, alert.Message, c.formatMetadata(alert.Metadata))
`,
validation.SanitizeEmailBodyValue(alert.ID),
validation.SanitizeEmailBodyValue(alert.Type),
validation.SanitizeEmailBodyValue(alert.Severity),
alert.CreatedAt.Format(time.RFC3339),
validation.SanitizeEmailBodyValue(alert.Subject),
validation.SanitizeEmailBodyValue(alert.Message),
c.formatMetadata(alert.Metadata),
)
return body
}
// formatEventBody formats an event notification as email body text.
//
// Same CodeQL go/email-injection mitigation as formatAlertBody — every
// user-controllable interpolated field routes through
// validation.SanitizeEmailBodyValue. CreatedAt is unsanitized (time.Time
// → RFC3339 is structural, not user-controllable).
func (c *Connector) formatEventBody(event notifier.Event) string {
certInfo := ""
if event.CertificateID != nil {
certInfo = fmt.Sprintf("Certificate ID: %s\n", *event.CertificateID)
certInfo = fmt.Sprintf("Certificate ID: %s\n", validation.SanitizeEmailBodyValue(*event.CertificateID))
}
body := fmt.Sprintf(`
@@ -398,12 +426,27 @@ Body:
%s
%s
`, event.ID, event.Type, event.CreatedAt.Format(time.RFC3339), certInfo, event.Subject, event.Body, c.formatMetadata(event.Metadata))
`,
validation.SanitizeEmailBodyValue(event.ID),
validation.SanitizeEmailBodyValue(event.Type),
event.CreatedAt.Format(time.RFC3339),
certInfo,
validation.SanitizeEmailBodyValue(event.Subject),
validation.SanitizeEmailBodyValue(event.Body),
c.formatMetadata(event.Metadata),
)
return body
}
// formatMetadata formats metadata as a readable string.
//
// Both keys and values can carry attacker-controlled content (cert
// subject DN fragments, discovered cert metadata, owner/team labels —
// all originate from API surfaces an attacker may influence). Both are
// routed through validation.SanitizeEmailBodyValue. Closes the
// CodeQL go/email-injection finding alongside formatAlertBody +
// formatEventBody.
func (c *Connector) formatMetadata(metadata map[string]string) string {
if len(metadata) == 0 {
return ""
@@ -411,7 +454,10 @@ func (c *Connector) formatMetadata(metadata map[string]string) string {
metadataStr := "\nMetadata:\n"
for key, value := range metadata {
metadataStr += fmt.Sprintf(" %s: %s\n", key, value)
metadataStr += fmt.Sprintf(" %s: %s\n",
validation.SanitizeEmailBodyValue(key),
validation.SanitizeEmailBodyValue(value),
)
}
return metadataStr
+123
View File
@@ -3,6 +3,7 @@ package validation
import (
"fmt"
"strings"
"unicode"
)
// ValidateHeaderValue rejects any value that contains characters capable of
@@ -34,3 +35,125 @@ func ValidateHeaderValue(field, value string) error {
}
return nil
}
// SanitizeEmailBodyValue scrubs control characters and visually-spoofable
// Unicode from a single field that will be interpolated into a plaintext
// email body. Closes CodeQL go/email-injection (CWE-640 / OWASP Content
// Spoofing): an attacker who controls a field surfaced to an
// operator-bound notification (cert subject DN, discovered cert metadata,
// alert subject / message, event subject / body, metadata key+value
// pairs) could otherwise plant content that:
//
// - Forges header-like content using bare CR/LF (some mail relays
// misinterpret bare LF mid-body as a header boundary; RFC 5321
// mandates CRLF, but defense in depth says strip bare LFs).
// - Embeds NUL bytes (forbidden by RFC 5321 sec 4.5.2; some MTAs
// truncate at NUL, allowing content elision).
// - Plants bidi-override Unicode (U+202A..U+202E, U+2066..U+2069) so a
// malicious URL renders as a benign one in the recipient's mail
// client.
// - Plants zero-width / invisible Unicode (U+200B..U+200D, U+FEFF,
// U+2060..U+2063) so a phishing-prone URL hides whitespace.
// - Plants C0 / C1 control characters that mail clients may render
// unpredictably or strip in surprising ways.
//
// The sanitizer NEVER errors; it always returns a sanitized string. This
// is the right contract for body content (vs. headers, which fail loud)
// because dropping a notification because the cert subject DN happens to
// contain a Mongolian Vowel Separator would be worse than escaping it.
//
// What the sanitizer does:
//
// - Strip NUL bytes (\x00) entirely.
// - Replace bare LF / CR with a single space. Multi-line legitimate
// body content gets its CRLF formatting from the email serializer
// above this layer; a SINGLE FIELD interpolated into the body
// should never carry its own line breaks.
// - Strip bidi-override and zero-width characters.
// - Strip C0 control chars (< 0x20) except TAB. Strip DEL (0x7F) +
// C1 control chars (0x80-0x9F).
// - Leave ordinary printable Unicode (including non-Latin scripts)
// intact.
//
// Apply this to EVERY user-controllable field before interpolating into
// a plaintext email body. Do NOT apply it to operator-controlled
// constants (template literals, severity tier names) — those don't
// carry the threat. The HTML email path uses html/template upstream
// and does not need this sanitizer (html/template's contextual
// auto-escape handles the same threats for HTML rendering).
func SanitizeEmailBodyValue(value string) string {
if value == "" {
return value
}
var b strings.Builder
b.Grow(len(value))
for _, r := range value {
switch {
case r == 0:
// NUL — strip entirely (RFC 5321 sec 4.5.2 violation).
continue
case r == '\r' || r == '\n':
// Strip line breaks within a single interpolated field.
b.WriteRune(' ')
case r == '\t':
// TAB is legitimate body content.
b.WriteRune(r)
case r < 0x20:
// C0 control chars (except TAB above) — strip.
continue
case r >= 0x7F && r <= 0x9F:
// DEL + C1 control chars — strip.
continue
case r == 0xFFFD:
// Replacement character — Go's range emits this for any
// malformed UTF-8 byte sequence. Defense in depth: an
// attacker who plants invalid UTF-8 (e.g. raw 0x80..0xFF
// without a valid lead byte) should not have their input
// surface as an arbitrary glyph in operator-bound mail.
continue
case isBidiOrZeroWidth(r):
// Bidi-override + zero-width — strip; visually spoofable.
continue
case unicode.IsControl(r):
// Catch-all: any remaining Unicode control class.
continue
default:
b.WriteRune(r)
}
}
return b.String()
}
// isBidiOrZeroWidth reports whether r is one of the bidi-override or
// zero-width Unicode codepoints used in homograph / direction-spoofing
// attacks. Mirrors the validator in internal/connector/issuer/local
// (validateCSRUnicode); kept inline here to avoid a new import edge
// from internal/validation back to the local issuer package.
//
// Codepoints expressed as numeric ranges instead of rune-literal
// switch cases — Go source rejects literal invisible characters
// (e.g. BOM U+FEFF) mid-file, so we compare against numeric values.
func isBidiOrZeroWidth(r rune) bool {
switch {
// LRE U+202A, RLE U+202B, PDF U+202C, LRO U+202D, RLO U+202E
case r >= 0x202A && r <= 0x202E:
return true
// LRI U+2066, RLI U+2067, FSI U+2068, PDI U+2069
case r >= 0x2066 && r <= 0x2069:
return true
// Zero-width space U+200B, ZWNJ U+200C, ZWJ U+200D
case r >= 0x200B && r <= 0x200D:
return true
// Word joiner U+2060, invisible separator U+2061,
// invisible times U+2062, invisible plus U+2063
case r >= 0x2060 && r <= 0x2063:
return true
// Byte-order mark / zero-width no-break space U+FEFF
case r == 0xFEFF:
return true
// Mongolian Vowel Separator U+180E
case r == 0x180E:
return true
}
return false
}
+124
View File
@@ -68,3 +68,127 @@ func TestValidateHeaderValue_DefaultFieldName(t *testing.T) {
t.Errorf("expected default field name 'header' in error, got %q", err.Error())
}
}
// TestSanitizeEmailBodyValue_PreservesSafeInput pins the contract that
// ordinary body content (including non-Latin scripts and tabs) flows
// through unchanged. The sanitizer must be a no-op for legitimate input
// — over-stripping degrades operator notifications.
func TestSanitizeEmailBodyValue_PreservesSafeInput(t *testing.T) {
tests := []struct {
name string
input string
}{
{"plain ASCII", "Renewal reminder for prod.example.com"},
{"empty", ""},
{"utf-8 multibyte", "résumé — 日本語 — مرحبا"},
{"tabs allowed", "key:\tvalue"},
{"spaces", " multiple spaces "},
{"common cert DN", "CN=api.example.com,O=Acme Corp,C=US"},
{"URL with safe chars", "https://docs.example.com/cert/mc-prod-api"},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
got := SanitizeEmailBodyValue(tc.input)
if got != tc.input {
t.Errorf("expected unchanged %q, got %q", tc.input, got)
}
})
}
}
// TestSanitizeEmailBodyValue_StripsControlChars pins the CodeQL
// go/email-injection (CWE-640) defense — every attacker-plant-able
// control character is stripped or replaced.
func TestSanitizeEmailBodyValue_StripsControlChars(t *testing.T) {
tests := []struct {
name string
input string
want string
wantSafer bool // want output != input
}{
{"NUL byte stripped", "before\x00after", "beforeafter", true},
{"bare LF replaced with space", "line1\nline2", "line1 line2", true},
{"bare CR replaced with space", "line1\rline2", "line1 line2", true},
{"CRLF replaced (both stripped)", "line1\r\nline2", "line1 line2", true},
{"BEL stripped", "alert\x07now", "alertnow", true},
{"backspace stripped", "x\x08y", "xy", true},
{"DEL stripped", "x\x7fy", "xy", true},
// C1 control chars must be specified via Unicode escape (\u) so
// the source remains valid UTF-8; bare \x80 / \x9f bytes would
// be invalid UTF-8 and Go's range emits U+FFFD instead, which
// would test the malformed-UTF-8 strip path, not the C1 path.
{"C1 control char stripped (U+0080)", "x\u0080y", "xy", true},
{"C1 control char stripped (U+009F)", "x\u009Fy", "xy", true},
// U+FFFD is the replacement char Go emits for malformed UTF-8.
// We strip it as defense-in-depth so attacker-planted invalid
// UTF-8 doesn't survive into operator notifications as an
// arbitrary glyph.
{"replacement char stripped", "x\uFFFDy", "xy", true},
{"TAB preserved (legitimate body content)", "k:\tv", "k:\tv", false},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
got := SanitizeEmailBodyValue(tc.input)
if got != tc.want {
t.Errorf("input %q: want %q, got %q", tc.input, tc.want, got)
}
if tc.wantSafer && got == tc.input {
t.Errorf("expected sanitization to change %q, but output unchanged", tc.input)
}
})
}
}
// TestSanitizeEmailBodyValue_StripsBidiOverride pins the
// visually-spoofable Unicode defense (homograph / RTL-override /
// zero-width attacks). An attacker who controls a CN or metadata value
// could otherwise plant a malicious URL that renders benignly in mail
// clients that honor bidi-override codepoints.
func TestSanitizeEmailBodyValue_StripsBidiOverride(t *testing.T) {
tests := []struct {
name string
input string
}{
// U+202E = Right-to-left override
{"RTL override", "Click www.evil.com to verify"},
// U+202D = Left-to-right override
{"LRO override", "Click www.evil.com to verify"},
// U+2066 = Left-to-right isolate
{"LRI isolate", "Click www.evil.com to verify"},
// U+200B = Zero-width space
{"zero-width space", "evil.example.com"},
// U+200C = ZWNJ
{"zero-width non-joiner", "admin@example.com"},
// U+FEFF = byte-order mark / zero-width no-break space
{"BOM", "x\uFEFFy"},
// U+180E = Mongolian Vowel Separator
{"MVS", "ab"},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
got := SanitizeEmailBodyValue(tc.input)
if got == tc.input {
t.Errorf("expected bidi/zero-width stripping for %q, got unchanged %q", tc.input, got)
}
})
}
}
// TestSanitizeEmailBodyValue_ContentSpoofingScenario pins the specific
// CodeQL go/email-injection (CWE-640) example: an attacker who controls
// a body field plants header-like content. The sanitizer neutralizes
// the attempt by stripping bare LF/CR within the field.
func TestSanitizeEmailBodyValue_ContentSpoofingScenario(t *testing.T) {
// Attacker plants a body value that tries to fake a "Reply-To"
// header inside the body. Even if mail clients don't honor it, a
// recipient skimming the body could be fooled.
attacker := "alert from compromised cert\r\nReply-To: attacker@evil.com\r\nClick https://evil.example.com/reset"
got := SanitizeEmailBodyValue(attacker)
if got == attacker {
t.Fatalf("attacker input passed through unchanged: %q", got)
}
// Specifically: no CR or LF should remain in the field.
if strings.ContainsAny(got, "\r\n") {
t.Errorf("CR/LF still present after sanitization: %q", got)
}
}