Skip to content

Avoid splitting surrogate pairs when shortening address labels#12511

Open
greymoth-jp wants to merge 1 commit into
openstreetmap:developfrom
greymoth-jp:fix-addr-label-surrogate-split
Open

Avoid splitting surrogate pairs when shortening address labels#12511
greymoth-jp wants to merge 1 commit into
openstreetmap:developfrom
greymoth-jp:fix-addr-label-surrogate-split

Conversation

@greymoth-jp

Copy link
Copy Markdown

The address-label shortening loop in modules/svg/labels.js trims one character at a time using substring with a UTF-16 .length count:

name = `${name.substring(0, name.replace(/$/, '').length - 1)}…`;

.length counts UTF-16 code units, so length - 1 can land in the middle of a surrogate pair. When the label ends in a character outside the BMP, that character is two code units, and substring keeps only the first half. The result is a lone surrogate that renders as .

This is reachable for real address points. utilDisplayName resolves the label from name, then addr:housename, then the house number, and both name and addr:housename are free text. A house or building name ending in an emoji, or in a CJK Extension B ideograph such as 𠮷 (U+20BB7, which appears in Japanese names), hits this case once the label is wide enough to be trimmed.

The fix uses Array.from(...).slice(0, -1) to drop the last code point instead, the same approach utilUnicodeCharsTruncated already takes in modules/util/util.ts. Strings within the BMP, including all BMP CJK, are unchanged.

Checked with node:

"ビル𠮷"      old: "ビル\uD842…"  (lone surrogate)   new: "ビル…"
"漢字テスト"   old: "漢字テス…"                        new: "漢字テス…"  (unchanged)

I did not add a test, since this runs inside the canvas label-rendering loop. utilUnicodeCharsTruncated, which uses the same technique, is covered in test/spec/util/util.js.

@matkoniecz

Copy link
Copy Markdown
Contributor

the same approach utilUnicodeCharsTruncated already takes in modules/util/util.ts

can utilUnicodeCharsTruncated be used here directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants