A lightweight, fast English lemmatizer and stemmer for high-performance text normalization.
pip install lightlemmafrom lightlemma import lemmatize, stem, tokenize, text_to_lemmas
# Lemmatization (dictionary-based, linguistically accurate)
lemmatize("running") # → "run"
lemmatize("better") # → "good"
# Stemming (rule-based, faster)
stem("running") # → "run"
stem("studies") # → "studi"
# Tokenization
tokenize("Hello, world!") # → ["hello", "world"]
# Complete text processing
text_to_lemmas("The cats are running") # → ["the", "cat", "be", "run"]- Fast lemmatization: Dictionary-based word normalization
- Porter stemmer: Rule-based suffix removal
- Flexible tokenization: Customizable text splitting
- Zero dependencies: Lightweight and self-contained
- Simple API: Easy integration into existing projects
| Feature | Lemmatization | Stemming |
|---|---|---|
| Output | Real words | May produce non-words |
| Method | Dictionary + morphology | Rule-based suffix removal |
| Speed | Moderate | Fast |
| Accuracy | High | Moderate |
| Example | "studies" → "study" | "studies" → "studi" |
For detailed usage examples, advanced features, and complete API documentation, see README_FULL.md.
MIT License - see the LICENSE file for details.