feat: Add Rust unit tests to CI pipeline (closes #71) by ada-cinar · Pull Request #72 · cdliai/durak

ada-cinar · 2026-01-27T05:32:56Z

Closes #71

Summary

Adds Rust unit tests, clippy linting, and code formatting checks to the CI/CD pipeline, ensuring Rust core quality is validated before merge.

Changes

1. Rust Unit Tests

Add cargo test --all-features step
Runs 6 existing tests:
- test_lemma_dict_loading
- test_lookup_lemma_high_frequency_nouns
- test_lookup_lemma_high_frequency_verbs
- test_lookup_lemma_oov_words
- test_lemma_dict_format_validation
- test_strip_suffixes_basic

2. Rust Linting (Clippy)

Add cargo clippy -- -D warnings
Fail build on any clippy warnings
Enforces Rust best practices

3. Rust Formatting (rustfmt)

Add cargo fmt --check
Ensures consistent code style
Applied cargo fmt to src/lib.rs (whitespace/import order fixes)

4. CI Execution Order

Local Testing

All checks pass:

$ cargo test --all-features
test result: ok. 6 passed; 0 failed

$ cargo clippy -- -D warnings
Finished `dev` profile [unoptimized + debuginfo]

$ cargo fmt --check
(no output = success)

Benefits

✅ Catch Rust-level bugs early - Lemma dict parsing, normalization logic, suffix stripping
✅ Prevent resource loading regressions - Validates embedded stopwords/lemma dict
✅ Enforce code quality - Clippy warnings catch common issues
✅ Consistent formatting - rustfmt ensures uniform style
✅ Documentation via tests - Tests show expected behavior of core functions

Related Issues

Issue [Test Coverage] Add Native Rust Unit Tests to src/lib.rs #48: "Add Native Rust Unit Tests to src/lib.rs" - tests exist, now run in CI ✅
Issue [Enhancement] Add Fuzzing Infrastructure for Rust Core #59: "Add Fuzzing Infrastructure for Rust Core" - can build on this CI foundation
Issue [Enhancement] Add Security Scanning to CI/CD Pipeline (Dependabot + CodeQL) #45: "Add Security Scanning to CI/CD" - should include Rust code scanning

Checklist

✅ Add cargo test step to .github/workflows/tests.yml
✅ Add cargo clippy for Rust linting
✅ Add cargo fmt --check for Rust formatting
Update CONTRIBUTING.md to mention Rust tests must pass (future)

Ready for review! 🚀

- Add gold-standard test set with 73 Turkish word-lemma pairs - Create evaluate_lemmatizer.py script for strategy comparison - Implement baseline storage for regression detection - Achieve 97.3% accuracy with lookup/hybrid strategies - Add comprehensive evaluation documentation Resolves #56

- Expand gold_standard.tsv to 109 test cases (100+ requirement met) - Add conditional tense, imperatives, participles - Add proper nouns with apostrophes - Add compound words and complex suffix chains - Add adjective-to-noun derivations - Update baseline metrics (lookup: 68.8%, hybrid: 69.7%, heuristic: 18.3%) - Lower accuracy reflects more challenging test set - Better represents real-world lemmatization complexity - Add CI regression testing to .github/workflows/tests.yml - Fails build if accuracy drops >5% from baseline - Runs on Python 3.11 after unit tests - Document strategy selection in BEST_PRACTICES.md - Add comparison table with accuracy benchmarks - Provide usage guidelines for each strategy - Include custom dataset evaluation instructions All success criteria from issue #56 now met: ✅ 100+ hand-curated test pairs ✅ Evaluation script with metrics ✅ Baseline metrics stored ✅ CI job for regression detection ✅ Strategy comparison documentation

- Add LemmatizerMetrics dataclass with performance tracking - Call counts (total, lookup hits/misses, heuristic calls) - Timing metrics (total, lookup, heuristic time) - Computed properties (cache hit rate, avg call time) - Extend Lemmatizer class with metrics support - collect_metrics parameter (default: False, zero overhead) - get_metrics() and reset_metrics() methods - Per-call timing instrumentation using perf_counter - Updated __repr__ to show metrics status - Add comprehensive test suite - 11 new tests covering all metrics scenarios - Tests for lookup, heuristic, hybrid strategies - Timing validation, reset functionality - Computed properties verification - Add interactive demo script - examples/lemmatizer_metrics_demo.py - Basic metrics collection example - Strategy comparison benchmark - Large corpus performance test - Incremental monitoring demo - Export LemmatizerMetrics in __init__.py Benefits: ✅ Data-driven strategy selection ✅ Performance debugging and profiling ✅ Research reproducibility ✅ Production monitoring capability ✅ Zero overhead when disabled Related to #56 (Lemma Evaluation Framework) - metrics enable deeper performance analysis during evaluation.

- Add new Lemmatization section with strategy overview - Document performance metrics collection feature - Add usage examples for metrics and strategy comparison - Reference example demo script Completes documentation for issue #63

Improve metrics collection pattern in Lemmatizer: - Replace 'if self.collect_metrics' with 'if self._metrics is not None' - More robust and idiomatic pattern - Avoids potential state inconsistencies - All metrics tests passing (11/11) Related to #63

- Add 'cargo test --all-features' step to run Rust unit tests (6 tests) - Add 'cargo clippy -- -D warnings' for Rust linting - Add 'cargo fmt --check' for Rust code formatting validation - Run Rust checks before Python tests to catch core issues early - Format Rust code with 'cargo fmt' (whitespace/import order fixes) Benefits: ✅ Catch Rust-level bugs in CI (lemma dict, normalization, suffix stripping) ✅ Enforce code quality standards (clippy warnings) ✅ Ensure consistent formatting (rustfmt) ✅ Prevent resource loading regressions Closes #71

fbkaragoz · 2026-01-30T08:25:59Z

@copilot fix the conflict errors if possible

Copilot · 2026-01-30T08:26:07Z

@fbkaragoz I've opened a new pull request, #137, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: fbkaragoz <59958216+fbkaragoz@users.noreply.github.com>

ada-cinar added 9 commits January 27, 2026 04:35

fix: Resolve linting issues (unused imports, line length, typing)

0607871

fix: Resolve remaining E501 line length issues in evaluate_lemmatizer.py

dfa7e87

fix: Remove unnecessary open() mode and fix Dict type hints

010015a

fbkaragoz added the help wanted Extra attention is needed label Jan 30, 2026

Initial plan

ca83b83

Copilot AI mentioned this pull request Jan 30, 2026

fix: Resolve merge conflicts with main branch for Rust CI integration #137

Merged

Copilot AI and others added 4 commits January 30, 2026 08:27

fix: Resolve merge conflicts with main branch

b155387

Co-authored-by: fbkaragoz <59958216+fbkaragoz@users.noreply.github.com>

fix: Add missing test_properties.py and strategies.py from main

af56969

Co-authored-by: fbkaragoz <59958216+fbkaragoz@users.noreply.github.com>

perf: Run Rust checks only once instead of for each Python version

c6d0e8f

Co-authored-by: fbkaragoz <59958216+fbkaragoz@users.noreply.github.com>

Merge pull request #137 from cdliai/copilot/sub-pr-72

a308cd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Rust unit tests to CI pipeline (closes #71)#72

feat: Add Rust unit tests to CI pipeline (closes #71)#72
ada-cinar wants to merge 14 commits into
mainfrom
feature/71-ci-rust-tests

ada-cinar commented Jan 27, 2026

Uh oh!

fbkaragoz commented Jan 30, 2026

Uh oh!

Copilot AI commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ada-cinar commented Jan 27, 2026

Summary

Changes

1. Rust Unit Tests

2. Rust Linting (Clippy)

3. Rust Formatting (rustfmt)

4. CI Execution Order

Local Testing

Benefits

Related Issues

Checklist

Uh oh!

fbkaragoz commented Jan 30, 2026

Uh oh!

Copilot AI commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants