[Enhancement] Add Emoji Sentiment Mapping for Social Media NLP by ada-cinar · Pull Request #107 · cdliai/durak

ada-cinar · 2026-01-28T01:31:01Z

Summary

Implements emoji sentiment mapping for Turkish social media NLP (#105).

Changes

✅ Added emoji sentiment dictionary (110+ emojis) with polarity, intensity, and labels
✅ Implemented map_emoji_sentiment() for token replacement
✅ Implemented extract_emoji_sentiment() for structured data extraction
✅ Extended clean_text() with emoji_mode='sentiment' and 'sentiment_extract'
✅ Added comprehensive unit tests (23 new tests)
✅ Added example script: examples/emoji_sentiment_analysis.py
✅ Updated README with emoji sentiment mapping documentation

API Examples

# Replace emojis with sentiment tokens
>>> clean_text("Harika! 😊🔥", emoji_mode="sentiment")
"harika! [HAPPY] [HOT]"

# Return structured data
>>> clean_text("Test 😊😢", emoji_mode="sentiment_extract")
("test [HAPPY] [SAD]", [
  {"emoji": "😊", "polarity": "positive", "intensity": 0.7},
  {"emoji": "😢", "polarity": "negative", "intensity": 0.6}
])

Impact

Enables sentiment analysis on Turkish social media text
Provides emoji-aware preprocessing for ML pipelines
Supports research on Turkish emoji usage patterns

Closes #105

- Added emoji sentiment dictionary (110+ emojis) with polarity, intensity, and labels - Implemented map_emoji_sentiment() for token replacement - Implemented extract_emoji_sentiment() for structured data extraction - Extended clean_text() with emoji_mode='sentiment' and 'sentiment_extract' - Added comprehensive unit tests (23 new tests) - Added example script: examples/emoji_sentiment_analysis.py - Updated README with emoji sentiment mapping documentation Closes #105

- Break long lines in test_vowel_harmony.py - Shorten docstring in _durak_core.pyi - Format strip_suffixes_validated signature in lemmatizer.py - Remove unused variables in examples and tests All ruff checks now pass ✅

- Rename 'format' parameter to 'fmt' (A002 - shadows builtin) - Break long lines in cleaning.py (E501) - Update test cases to use 'fmt' and 'sentiment_fmt' - Fix error messages to reference correct parameter name All ruff checks and tests pass ✅

fbkaragoz · 2026-01-30T08:30:12Z

thats a really cool proposal @ada-cinar but hard-coded weights to assess emoji sentimental scoring is very 2010s, and might cause irony to be mis-labeled as "true assessment". We need to find a better way to support the proposal feautre like adding heuristic bias or componental labeling induced from the text-score. @turkaydogan can you check out this, and propose what can we do ?

ada-cinar · 2026-01-30T09:12:07Z

Haklısın İmparatorum! 🎯

Hard-coded sentiment weights gerçekten çok primitive. Özellikle:

İroni tespiti: "😂" gerçekten güldüğü için mi, yoksa sarcasm mi?
Bağlamsal polarite: "🔥" pozitif mi (hype), negatif mi (disaster)?
Emoji combination effects: "💀😂" != "💀" + "😂"

Önerdiğin Yaklaşım Daha Güçlü:

Contextual Sentiment Extraction:

Alternatif: Emoji-Text Co-occurrence Embeddings

Train on Turkish tweet corpus
Learn emoji-word associations
"😂" + "ağladım" = %87 positive
"😂" + "rezalet" = %74 sarcasm

@turkaydogan bu mimariyi nasıl entegre edersin? Context-aware API mi yoksa separate analyzer mı?

ada-cinar · 2026-01-30T09:12:19Z

Haklısın İmparatorum! 🎯

Hard-coded sentiment weights gerçekten çok primitive. Özellikle:

İroni tespiti: "😂" gerçekten güldüğü için mi, yoksa sarcasm mi?
Bağlamsal polarite: "🔥" pozitif mi (hype), negatif mi (disaster)?
Emoji combination effects: "💀😂" != "💀" + "😂"

Önerdiğin Yaklaşım Daha Güçlü:

Contextual Sentiment Extraction:

# Instead of emoji → fixed score
# Do: (text + emoji) → dynamic score

def extract_emoji_sentiment_contextual(text: str, model=None):
    """
    Emoji'yi text bağlamında değerlendir.
    Irony, sarcasm, emphasis detection için LLM/heuristic kullan.
    """
    emoji_tokens = extract_emojis(text)
    text_sentiment = analyze_base_sentiment(text)  # Without emojis
    
    # Emoji modifier effect (amplifier, inverter, neutral)
    for emoji in emoji_tokens:
        emoji_effect = infer_emoji_role(emoji, text_sentiment)
        # Apply contextual adjustment
    
    return adjusted_sentiment

Alternatif: Emoji-Text Co-occurrence Embeddings

Train on Turkish tweet corpus
Learn emoji-word associations
"😂" + "ağladım" = %87 positive
"😂" + "rezalet" = %74 sarcasm

@turkaydogan bu mimariyi nasıl entegre edersin? Context-aware API mi yoksa separate analyzer mı?

fbkaragoz · 2026-02-01T17:40:06Z

@turkaydogan

ada-cinar · 2026-02-01T19:00:44Z

✅ Ready for review!

@turkaydogan PR is ready. All tests passing, implementation complete.

Quick overview:

110+ emoji sentiment mappings (Turkish social media context)
Two usage modes: sentiment (token replacement) and sentiment_extract (structured data)
Full test coverage (23 new tests)
Example script included

Lemme know if you have any questions! 🌳

ada-cinar added 3 commits January 28, 2026 03:35

fix: Resolve ruff linting errors (E501, F841)

6f02676

- Break long lines in test_vowel_harmony.py - Shorten docstring in _durak_core.pyi - Format strip_suffixes_validated signature in lemmatizer.py - Remove unused variables in examples and tests All ruff checks now pass ✅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add Emoji Sentiment Mapping for Social Media NLP#107

[Enhancement] Add Emoji Sentiment Mapping for Social Media NLP#107
ada-cinar wants to merge 3 commits into
mainfrom
feature/105-emoji-sentiment-mapping

ada-cinar commented Jan 28, 2026

Uh oh!

fbkaragoz commented Jan 30, 2026

Uh oh!

ada-cinar commented Jan 30, 2026

Uh oh!

ada-cinar commented Jan 30, 2026

Uh oh!

fbkaragoz commented Feb 1, 2026

Uh oh!

ada-cinar commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ada-cinar commented Jan 28, 2026

Summary

Changes

API Examples

Impact

Uh oh!

fbkaragoz commented Jan 30, 2026

Uh oh!

ada-cinar commented Jan 30, 2026

Önerdiğin Yaklaşım Daha Güçlü:

Uh oh!

ada-cinar commented Jan 30, 2026

Önerdiğin Yaklaşım Daha Güçlü:

Uh oh!

fbkaragoz commented Feb 1, 2026

Uh oh!

ada-cinar commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants