feat: Add performance metrics collection to Lemmatizer#68
Open
ada-cinar wants to merge 9 commits into
Open
Conversation
- Add gold-standard test set with 73 Turkish word-lemma pairs - Create evaluate_lemmatizer.py script for strategy comparison - Implement baseline storage for regression detection - Achieve 97.3% accuracy with lookup/hybrid strategies - Add comprehensive evaluation documentation Resolves #56
- Expand gold_standard.tsv to 109 test cases (100+ requirement met) - Add conditional tense, imperatives, participles - Add proper nouns with apostrophes - Add compound words and complex suffix chains - Add adjective-to-noun derivations - Update baseline metrics (lookup: 68.8%, hybrid: 69.7%, heuristic: 18.3%) - Lower accuracy reflects more challenging test set - Better represents real-world lemmatization complexity - Add CI regression testing to .github/workflows/tests.yml - Fails build if accuracy drops >5% from baseline - Runs on Python 3.11 after unit tests - Document strategy selection in BEST_PRACTICES.md - Add comparison table with accuracy benchmarks - Provide usage guidelines for each strategy - Include custom dataset evaluation instructions All success criteria from issue #56 now met: ✅ 100+ hand-curated test pairs ✅ Evaluation script with metrics ✅ Baseline metrics stored ✅ CI job for regression detection ✅ Strategy comparison documentation
- Add LemmatizerMetrics dataclass with performance tracking - Call counts (total, lookup hits/misses, heuristic calls) - Timing metrics (total, lookup, heuristic time) - Computed properties (cache hit rate, avg call time) - Extend Lemmatizer class with metrics support - collect_metrics parameter (default: False, zero overhead) - get_metrics() and reset_metrics() methods - Per-call timing instrumentation using perf_counter - Updated __repr__ to show metrics status - Add comprehensive test suite - 11 new tests covering all metrics scenarios - Tests for lookup, heuristic, hybrid strategies - Timing validation, reset functionality - Computed properties verification - Add interactive demo script - examples/lemmatizer_metrics_demo.py - Basic metrics collection example - Strategy comparison benchmark - Large corpus performance test - Incremental monitoring demo - Export LemmatizerMetrics in __init__.py Benefits: ✅ Data-driven strategy selection ✅ Performance debugging and profiling ✅ Research reproducibility ✅ Production monitoring capability ✅ Zero overhead when disabled Related to #56 (Lemma Evaluation Framework) - metrics enable deeper performance analysis during evaluation.
- Add new Lemmatization section with strategy overview - Document performance metrics collection feature - Add usage examples for metrics and strategy comparison - Reference example demo script Completes documentation for issue #63
Improve metrics collection pattern in Lemmatizer: - Replace 'if self.collect_metrics' with 'if self._metrics is not None' - More robust and idiomatic pattern - Avoids potential state inconsistencies - All metrics tests passing (11/11) Related to #63
- Add None checks for timing variables (start_time, lookup_start, heuristic_start) - Add assertion in get_metrics() to satisfy mypy return type - Fixes mypy [operator] and [return-value] errors
Member
|
@ada-cinar can you check out the conflicts preventing the merge? Ill be in my work station soon |
Member
Author
|
✅ Merge conflict çözüldü! Rebase yaptım, conflict'ler temizlendi. Şimdi merge'e hazır. 🔍 Son kontroller:
Ready for merge! 🚀🌳 |
Member
Author
|
Rebase'den sonra hâlâ |
Member
Author
🔍 Merge Conflict AnaliziDurum: Ciddi conflict var, manuel çözüm gerekiyor. Conflict'teki dosyalar:
Sebep:
Çözüm seçenekleri:
Öneri: Manuel rebase en temiz çözüm olur. İmparatorum'un lokal'de çözmesini öneriyorum. 🌳 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #63
Summary
Adds comprehensive performance metrics collection to the Lemmatizer class, enabling data-driven strategy selection, performance debugging, and production monitoring.
Changes
1. LemmatizerMetrics Dataclass
2. Extended Lemmatizer Class
collect_metricsparameter (default:False) - zero overhead when disabledget_metrics()- retrieve current metricsreset_metrics()- reset counters to zeroperf_counterfor accurate timing__repr__to show metrics status3. Comprehensive Test Suite
Added 11 new tests covering:
All tests passing ✅
4. Interactive Demo Script
Includes:
Example output:
5. README Documentation
Usage
Basic Example
Strategy Comparison
Benefits
collect_metrics=FalseTesting Results
pytest tests/test_lemmatizer.py -v # 20 passed in 0.03sAll metrics tests passing with:
Integration with Issue #56
This directly enhances the evaluation framework from #56:
Success Criteria (from #63)
All requirements complete! 🎉
Related Issues
Ready for review! 🚀