Skip to content

feat(emoji): generate catalog from Unicode emoji-test.txt#344

Open
dmnyc wants to merge 1 commit into
mainfrom
feat/emoji-catalog-generator
Open

feat(emoji): generate catalog from Unicode emoji-test.txt#344
dmnyc wants to merge 1 commit into
mainfrom
feat/emoji-catalog-generator

Conversation

@dmnyc

@dmnyc dmnyc commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

Replace the hand-maintained EmojiData catalog (a frozen port of the Android client) with one generated from the official Unicode 16.0 emoji-test.txt via scripts/generate_emoji_data.py. The old list silently lagged new Unicode releases, so common emoji — arrows, keycap digits, money, many flags, and newer additions like splatter — were missing without anyone noticing.

The catalog now covers ~1,869 emoji across 9 categories (adds a dedicated Flags category with all country/region/subdivision flags). Skin-tone variants are excluded (the app applies tones at render time via its own selector — see #304 / PR C), as are clock-face variants and Japanese ideograph buttons. Only fully-qualified forms are emitted, so entries are in canonical reaction-matching form. Re-run the script to pull in future Unicode releases.

Category curation

  • "Smileys" → "Expressions" — the section also holds animal faces, gestures, and hearts; "Expressions" reads less narrow than the original Unicode group name "Smileys & Emotion".
  • Symbols icon ❤️ → ⁉️ — the heart icon belonged to the Symbols tab when CLDR's heart subgroup lived there. Unicode 16 places those hearts in "Smileys & Emotion", so the Symbols icon needed a glyph that actually evokes Symbols.
  • 💟 (heart decoration) → Symbols — Unicode classifies it under "Smileys & Emotion" but it's a graphic symbol, not a face/emotion. Generator carries a small GROUP_OVERRIDE map so re-runs keep this in Symbols.

Search restoration

The generator commit initially dropped EmojiData.searchEmojis's consultation of CldrEmojiKeywords — searches went down to "Unicode name substring" only and lost coverage like "vulcan" → 🖖, "salute" → 🖖. Restored in the same commit by re-wiring CldrEmojiKeywords.keywordsByEmoji lookups and adding a small keywordAliases table for cultural shorthand that CLDR doesn't ship:

  • spock / llap / live long and prosper / trek → 🖖
  • pepe → 🐸
  • lol / lmao → 😂
  • ded / dead → 💀
  • fire / lit → 🔥
  • hundred / perfect / based → 💯
  • o7 → 🫡
  • bored → 🥱
  • clown → 🤡
  • eyes / looking → 👀
  • zap / bolt / lightning → ⚡
  • bitcoin / orange → 🟠 / 🟧

EmojiData's public API

categories, allEmojis, searchEmojis, defaultQuickReactions are unchanged. Existing call sites (post-card reaction display, picker grids) need no changes.

Files

  • EmojiData.swift — replaced; retains hand-curated defaultQuickReactions and keywordAliases, everything else is generator output
  • scripts/generate_emoji_data.py — new generator script with GROUP_MAP (display labels + tab icons) and GROUP_OVERRIDE (per-emoji category corrections)
  • scripts/emoji-test-16.0.txt — new, Unicode 16.0 source data (5,331 lines)

Test plan

  • Open the full emoji library sheet → all 9 categories populated; tab strip reads 😀 (Expressions) 👋 🐶 🍔 ⚽ ✈️ 💡 ⁉️ 🏁
  • Section header reads "Expressions" instead of "Smileys"
  • 💟 (heart decoration) appears under Symbols, not Expressions
  • Search "spock" → 🖖 appears
  • Search "vulcan" → 🖖 appears
  • Search "lol" → 😂 appears
  • Search "bitcoin" → 🟠 / 🟧 appears
  • React to a post with 🤩 (which was missing pre-PR) → reaction sends and renders
  • Build clean for iOS Simulator

Replace the hand-maintained `EmojiData` catalog (a frozen port of the
Android client) with one generated from the official Unicode 16.0
`emoji-test.txt` via `scripts/generate_emoji_data.py`. The old list
silently lagged new Unicode releases, so common emoji — arrows, keycap
digits, money, many flags, and newer additions like the splatter — were
missing without anyone noticing.

The catalog now covers ~1,869 emoji across 9 categories (adds a dedicated
Flags category with all country/region/subdivision flags). Skin-tone
variants are excluded (the app applies tones at render time via its own
selector — separate change), as are clock-face variants and Japanese
ideograph buttons. Only fully-qualified forms are emitted, so entries are
in canonical reaction-matching form. Re-run the script to pull in future
Unicode releases.

`EmojiData`'s public API (`categories`, `allEmojis`, `searchEmojis`,
`defaultQuickReactions`) is unchanged, so consumers need no changes.
@dmnyc dmnyc force-pushed the feat/emoji-catalog-generator branch from 5c48055 to 857190a Compare June 7, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant