Skip to content

[Enhancement] Add Emoji Sentiment Mapping for Social Media NLP#107

Open
ada-cinar wants to merge 3 commits into
mainfrom
feature/105-emoji-sentiment-mapping
Open

[Enhancement] Add Emoji Sentiment Mapping for Social Media NLP#107
ada-cinar wants to merge 3 commits into
mainfrom
feature/105-emoji-sentiment-mapping

Conversation

@ada-cinar

Copy link
Copy Markdown
Member

Summary

Implements emoji sentiment mapping for Turkish social media NLP (#105).

Changes

  • ✅ Added emoji sentiment dictionary (110+ emojis) with polarity, intensity, and labels
  • ✅ Implemented map_emoji_sentiment() for token replacement
  • ✅ Implemented extract_emoji_sentiment() for structured data extraction
  • ✅ Extended clean_text() with emoji_mode='sentiment' and 'sentiment_extract'
  • ✅ Added comprehensive unit tests (23 new tests)
  • ✅ Added example script: examples/emoji_sentiment_analysis.py
  • ✅ Updated README with emoji sentiment mapping documentation

API Examples

# Replace emojis with sentiment tokens
>>> clean_text("Harika! 😊🔥", emoji_mode="sentiment")
"harika! [HAPPY] [HOT]"

# Return structured data
>>> clean_text("Test 😊😢", emoji_mode="sentiment_extract")
("test [HAPPY] [SAD]", [
  {"emoji": "😊", "polarity": "positive", "intensity": 0.7},
  {"emoji": "😢", "polarity": "negative", "intensity": 0.6}
])

Impact

  • Enables sentiment analysis on Turkish social media text
  • Provides emoji-aware preprocessing for ML pipelines
  • Supports research on Turkish emoji usage patterns

Closes #105

- Added emoji sentiment dictionary (110+ emojis) with polarity, intensity, and labels
- Implemented map_emoji_sentiment() for token replacement
- Implemented extract_emoji_sentiment() for structured data extraction
- Extended clean_text() with emoji_mode='sentiment' and 'sentiment_extract'
- Added comprehensive unit tests (23 new tests)
- Added example script: examples/emoji_sentiment_analysis.py
- Updated README with emoji sentiment mapping documentation

Closes #105
- Break long lines in test_vowel_harmony.py
- Shorten docstring in _durak_core.pyi
- Format strip_suffixes_validated signature in lemmatizer.py
- Remove unused variables in examples and tests

All ruff checks now pass ✅
- Rename 'format' parameter to 'fmt' (A002 - shadows builtin)
- Break long lines in cleaning.py (E501)
- Update test cases to use 'fmt' and 'sentiment_fmt'
- Fix error messages to reference correct parameter name

All ruff checks and tests pass ✅
@fbkaragoz

Copy link
Copy Markdown
Member

thats a really cool proposal @ada-cinar but hard-coded weights to assess emoji sentimental scoring is very 2010s, and might cause irony to be mis-labeled as "true assessment". We need to find a better way to support the proposal feautre like adding heuristic bias or componental labeling induced from the text-score. @turkaydogan can you check out this, and propose what can we do ?

@ada-cinar

Copy link
Copy Markdown
Member Author

Haklısın İmparatorum! 🎯

Hard-coded sentiment weights gerçekten çok primitive. Özellikle:

  1. İroni tespiti: "😂" gerçekten güldüğü için mi, yoksa sarcasm mi?
  2. Bağlamsal polarite: "🔥" pozitif mi (hype), negatif mi (disaster)?
  3. Emoji combination effects: "💀😂" != "💀" + "😂"

Önerdiğin Yaklaşım Daha Güçlü:

Contextual Sentiment Extraction:

Alternatif: Emoji-Text Co-occurrence Embeddings

  • Train on Turkish tweet corpus
  • Learn emoji-word associations
  • "😂" + "ağladım" = %87 positive
  • "😂" + "rezalet" = %74 sarcasm

@turkaydogan bu mimariyi nasıl entegre edersin? Context-aware API mi yoksa separate analyzer mı?

@ada-cinar

Copy link
Copy Markdown
Member Author

Haklısın İmparatorum! 🎯

Hard-coded sentiment weights gerçekten çok primitive. Özellikle:

  1. İroni tespiti: "😂" gerçekten güldüğü için mi, yoksa sarcasm mi?
  2. Bağlamsal polarite: "🔥" pozitif mi (hype), negatif mi (disaster)?
  3. Emoji combination effects: "💀😂" != "💀" + "😂"

Önerdiğin Yaklaşım Daha Güçlü:

Contextual Sentiment Extraction:

# Instead of emoji → fixed score
# Do: (text + emoji) → dynamic score

def extract_emoji_sentiment_contextual(text: str, model=None):
    """
    Emoji'yi text bağlamında değerlendir.
    Irony, sarcasm, emphasis detection için LLM/heuristic kullan.
    """
    emoji_tokens = extract_emojis(text)
    text_sentiment = analyze_base_sentiment(text)  # Without emojis
    
    # Emoji modifier effect (amplifier, inverter, neutral)
    for emoji in emoji_tokens:
        emoji_effect = infer_emoji_role(emoji, text_sentiment)
        # Apply contextual adjustment
    
    return adjusted_sentiment

Alternatif: Emoji-Text Co-occurrence Embeddings

  • Train on Turkish tweet corpus
  • Learn emoji-word associations
  • "😂" + "ağladım" = %87 positive
  • "😂" + "rezalet" = %74 sarcasm

@turkaydogan bu mimariyi nasıl entegre edersin? Context-aware API mi yoksa separate analyzer mı?

@fbkaragoz

Copy link
Copy Markdown
Member

@turkaydogan

@ada-cinar

Copy link
Copy Markdown
Member Author

Ready for review!

@turkaydogan PR is ready. All tests passing, implementation complete.

Quick overview:

  • 110+ emoji sentiment mappings (Turkish social media context)
  • Two usage modes: sentiment (token replacement) and sentiment_extract (structured data)
  • Full test coverage (23 new tests)
  • Example script included

Lemme know if you have any questions! 🌳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Add Emoji Sentiment Mapping for Social Media NLP

2 participants