Skip to content

fix(deburr): handle ligatures and special Latin characters#199

Merged
maxdewald merged 3 commits into
mainfrom
fix/deburr-ligatures
Mar 17, 2026
Merged

fix(deburr): handle ligatures and special Latin characters#199
maxdewald merged 3 commits into
mainfrom
fix/deburr-ligatures

Conversation

@maxdewald

Copy link
Copy Markdown
Owner

Problem

deburr used naive NFD normalization which only handles characters with decomposed forms (e.g. ée). It completely failed for ligatures and special Latin letters that don't decompose via NFD:

deburr('Hællæ, hva skjera?') // ❌ 'Hællæ, hva skjera?' (unchanged!)
deburr('Straße')              // ❌ 'Straße' (unchanged!)

This also affected kebabCase, camelCase, snakeCase, pascalCase, and titleCase which all use deburr internally.

Solution

Added a sparse array lookup (31 entries indexed by char code) covering every character in Latin-1 Supplement (U+00C0–U+00FF) and Latin Extended-A (U+0100–U+017F) that doesn't decompose via NFD:

Characters Replacement
Æ/æ AE/ae
Œ/œ OE/oe
ß ss
Ø/ø O/o
Ð/ð D/d
Þ/þ Th/th
Ł/ł L/l
Đ/đ D/d
Ħ/ħ H/h
Ŋ/ŋ N/n
IJ/ij IJ/ij
Ŧ/ŧ T/t
+ others ...

Combined with a single-pass loop that both skips combining marks and replaces ligatures — no regex in the hot path.

deburr('Hællæ, hva skjera?') // ✅ 'Haellae, hva skjera?'
deburr('Straße')              // ✅ 'Strasse'
deburr('Œuvre')               // ✅ 'OEuvre'

Performance

Benchmarked with warmup (3 runs, consistent results):

Implementation ops/s Comparison
New (this PR) ~69,000 ~10% faster than old, ~2.4x faster than lodash
Old (broken) ~62,000 baseline
Lodash ~28,500

The new implementation is both more correct and faster than the old version.

Changes

  • package/src/string/deburr.ts — Rewrote with sparse array ligature map + NFD single-pass loop
  • package/test/string/deburr.test.ts — Added tests for Latin-1 Supplement ligatures, Latin Extended-A, and mixed diacritics+ligatures
  • benchmark/string/deburr.bench.ts — Updated test charset to include ligatures

The naive NFD normalization approach failed for characters that don't
decompose via NFD, such as æ, œ, ß, ø, ð, þ, ł, and others from
Latin-1 Supplement and Latin Extended-A blocks.

Added a sparse array lookup (31 entries indexed by char code) for these
characters, combined with a single-pass loop that both skips combining
marks and replaces ligatures — eliminating the regex in the hot path.

This makes deburr both correct and ~10% faster than the old version,
and ~2.4x faster than lodash.

Examples that now work:
  deburr('Hællæ, hva skjera?') // => 'Haellae, hva skjera?'
  deburr('Straße')             // => 'Strasse'
  deburr('Œuvre')              // => 'OEuvre'

Also fixes kebabCase, camelCase, snakeCase, pascalCase, and titleCase
which all use deburr internally.
@changeset-bot

changeset-bot Bot commented Mar 17, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 5a1e92c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
moderndash Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@netlify

netlify Bot commented Mar 17, 2026

Copy link
Copy Markdown

Deploy Preview for moderndash ready!

Name Link
🔨 Latest commit 5a1e92c
🔍 Latest deploy log https://app.netlify.com/projects/moderndash/deploys/69b947ad7fbb37000801b2d4
😎 Deploy Preview https://deploy-preview-199--moderndash.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@maxdewald maxdewald merged commit 979c403 into main Mar 17, 2026
9 checks passed
@maxdewald maxdewald linked an issue Mar 17, 2026 that may be closed by this pull request
@github-actions github-actions Bot mentioned this pull request Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug - deburr and kebabCase

2 participants