Skip to content

xga0/lightlemma

Repository files navigation

LightLemma

PyPI version PyPI - Python Version

A lightweight, fast English lemmatizer and stemmer for high-performance text normalization.

Quick Start

pip install lightlemma
from lightlemma import lemmatize, stem, tokenize, text_to_lemmas

# Lemmatization (dictionary-based, linguistically accurate)
lemmatize("running")  # → "run"
lemmatize("better")   # → "good"

# Stemming (rule-based, faster)
stem("running")       # → "run"
stem("studies")       # → "studi"

# Tokenization
tokenize("Hello, world!")  # → ["hello", "world"]

# Complete text processing
text_to_lemmas("The cats are running")  # → ["the", "cat", "be", "run"]

Key Features

  • Fast lemmatization: Dictionary-based word normalization
  • Porter stemmer: Rule-based suffix removal
  • Flexible tokenization: Customizable text splitting
  • Zero dependencies: Lightweight and self-contained
  • Simple API: Easy integration into existing projects

Lemmatization vs Stemming

Feature Lemmatization Stemming
Output Real words May produce non-words
Method Dictionary + morphology Rule-based suffix removal
Speed Moderate Fast
Accuracy High Moderate
Example "studies" → "study" "studies" → "studi"

Documentation

For detailed usage examples, advanced features, and complete API documentation, see README_FULL.md.

License

MIT License - see the LICENSE file for details.