Skip to content

Add HTML sanitizer for translated message resources#4

Open
zaibkhan wants to merge 1 commit into
feature-html-sanitizer-baselinefrom
feature-html-sanitizer-implementation
Open

Add HTML sanitizer for translated message resources#4
zaibkhan wants to merge 1 commit into
feature-html-sanitizer-baselinefrom
feature-html-sanitizer-implementation

Conversation

@zaibkhan

Copy link
Copy Markdown

No description provided.

Closes #37428

Signed-off-by: Alexander Schwartz <alexander.schwartz@gmx.net>
@codoki-pr-intelligence

codoki-pr-intelligence Bot commented Sep 10, 2025

Copy link
Copy Markdown

Codoki PR Review

Summary: Sanitize message HTML, fix malformed tags, reduce XSS risk
What’s good: The PR improves HTML correctness across localized message bundles (e.g., closing tags,
spacing) and appears to introduce a sanitizer for message properties, which reduces the risk of XSS in rendered emails/pages.
Review Status: ❌ Requires changes
Overall Priority: High
Review Update:
• Scope: Large PR detected (48 files)
• Coverage: Reviewed 30 highest-risk files across 3 batches
• Note: For full confidence, consider splitting this PR; current review covers top 30/48 files

This review covered the top 30 of 48 files (risk-ranked). For complete coverage and faster feedback, consider splitting into ~2 smaller PR(s).

Issues (Critical & High only)

Severity Issue Why it matters
High Correctness — Substring diff can throw StringIndexOutOfBoundsException …/themeverifier/VerifyMessageProperties.java
Potential StringIndexOutOfBoundsException when start + end exceeds the length of sanitized (or value) after sanitization removes content (e.g., stripping a whole tag), making substring begin > end. Bound the end indices to be at least start before calling substring.
High Correctness — Wrong language text in Lithuanian resource (Italian inser... …/messages/messages_lt.properties
This Lithuanian file contains an Italian sentence, which is a correctness issue and degrades UX. Also, the original guidance (listing FreeOTP/Google Authenticator and stores) is lost; we can remove HTML while keeping the brands as plain text.

Showing top 2 issues. Critical: 0, High: 2. See inline suggestions for more.

Key Feedback (click to expand)
  • Needs improvement: Add explicit constraints to the sanitizer for allowed tags/attributes (e.g., only with href and safe URI schemes) and document its behavior to avoid regressions. Consider scanning all locales for common typos introduced by escaped newlines shown as literal 'n'.
  • Testing: Add tests that: (1) reject anchors with javascript:, data:, vbscript:, and mixed-case scheme variants; (2) preserve and normalize safe anchors (http/https) including URL-encoded values; (3) strip or escape non-allowed tags (e.g., <script>, , <style>, on* attributes) while leaving plain text intact; (4) ensure locale parity by verifying that every *_Html key in non-English locales either matches English structure or is sanitized equivalently.
  • Documentation: Document the sanitizer’s allowlist (tags, attributes, URI schemes) and normalization rules (e.g., closing tags, spacing) in the theme-verifier module README or code-level Javadoc to guide translators and avoid reintroducing disallowed patterns.
  • Compatibility: Changing a malformed key like '***emailUpdateConfirmationBodyHtml' to 'emailUpdateConfirmationBodyHtml' aligns with other locales and should improve consistency; verify that no downstream integrations rely on the invalid key.
  • Security: Ensure the sanitizer enforces a strict allowlist (only ), removes all event handlers (on*), and validates href against a safe-scheme whitelist (http, https, mailto if desired). Normalize and decode inputs before validation to avoid bypass via obfuscation (e.g., unicode homoglyphs, mixed-case schemes, embedded control characters).
  • Open questions: Does the new sanitizer strictly block 'javascript:', 'data:', and other dangerous URL schemes in hrefs? Are non-anchor tags stripped or escaped consistently, and is there a canonical allowlist shared across all theme verifications?

Confidence: 3/5 — Needs work before merge (2 high · status: Requires changes · scope: top 30/48 files reviewed)

React with 👍 or 👎 if you found this review useful.

while (end < sanitized.length() && end < value.length() && value.charAt(value.length() - end - 1) == sanitized.charAt(sanitized.length() - end - 1)) {
end++;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ High: Potential StringIndexOutOfBoundsException when start + end exceeds the length of sanitized (or value) after sanitization removes content (e.g., stripping a whole tag), making substring begin > end. Bound the end indices to be at least start before calling substring.

Suggested change
```suggestion
int endValue = Math.max(start, value.length() - end);
int endSanitized = Math.max(start, sanitized.length() - end);
messages.add("Illegal HTML in key " + key + " for file " + file + ": '"
+ value.substring(start, endValue) + "' vs. '"
+ sanitized.substring(start, endSanitized) + "'");

try (FileInputStream fis = new FileInputStream(file)) {
bundle = new PropertyResourceBundle(fis);
} catch (IOException e) {
throw new RuntimeException("unable to read file " + file, e);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔷 Medium: The character class "[a-zA-Z-_]" creates an unintended range (Z to _) due to '-' placement, which may over-match. Also, anchoring to the end avoids accidental replacements earlier in the path.

Suggested change
throw new RuntimeException("unable to read file " + file, e);
```suggestion
String englishFile = file.getAbsolutePath()
.replace("resources-community", "resources")
.replaceFirst("_[A-Za-z_-]*\\.properties$", "_en.properties");


/**
* Allow only those anchor tags from the source key to also appear in the target key.
*/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔷 Medium: The method mutates 'value' inside a loop driven by a Matcher created from the original 'value'; since Matcher operates on the original CharSequence, subsequent replacements can desynchronize matching and produce incorrect removals with multiple anchors. Use the Matcher to perform replacements via appendReplacement/appendTail so the match context stays consistent.

suggestion
private String santizeAnchors(String key, String value, String englishValue) {
Matcher matcher = ANCHOR_PATTERN.matcher(value);
Matcher englishMatcher = ANCHOR_PATTERN.matcher(englishValue);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
if (englishMatcher.find() && Objects.equals(matcher.group(), englishMatcher.group())) {
matcher.appendReplacement(sb, "");
} else {
messages.add("Didn't find anchor tag " + matcher.group() + " in original string");
return value;
}
}
matcher.appendTail(sb);
return sb.toString();
}

configureAuthenticators=Sukonfigūruotas autentifikatorius
mobile=Mobilus
totpStep1=Įdiekite <a href="https://freeotp.github.io/" target="_blank">FreeOTP</a> arba Google Authenticator savo įrenginyje. Programėlės prieinamos <a href="https://play.google.com">Google Play</a> ir Apple App Store.
totpStep1=Installa una delle seguenti applicazioni sul tuo cellulare:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ High: This Lithuanian file contains an Italian sentence, which is a correctness issue and degrades UX. Also, the original guidance (listing FreeOTP/Google Authenticator and stores) is lost; we can remove HTML while keeping the brands as plain text.

Suggested change
totpStep1=Installa una delle seguenti applicazioni sul tuo cellulare:
totpStep1=Įdiekite FreeOTP arba Google Authenticator savo įrenginyje. Programėlės prieinamos Google Play ir Apple App Store.

configureAuthenticators=配置的认证者
mobile=手机
totpStep1=在你的设备上安装 <a href="https://fedorahosted.org/freeotp/" target="_blank">FreeOTP</a> 或者 Google Authenticator.两个应用可以从 <a href="https://play.google.com">Google Play</a> 和 Apple App Store下载。
totpStep1=在您的手機上安裝以下應用程式之一:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔷 Medium: This is Traditional Chinese text in a Simplified Chinese file (zh_CN). Use Simplified characters to avoid inconsistency and confusion.

Suggested change
totpStep1=在您的手機上安裝以下應用程式之一
totpStep1=在您的手机上安装以下应用之一

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants