feat(extraction): add TrapPruner, MissingnessRecognizer, TargetLeakageAuditor by nxank4 · Pull Request #70 · codepawl/loclean

nxank4 · 2026-02-26T20:12:47Z

Summary

Adds three new LLM-driven extraction modules for automated data quality analysis:

New Modules

Module	API	Purpose
`TrapPruner`	`loclean.prune_traps()`	Detects uncorrelated Gaussian noise columns via statistical profiling + LLM verification
`MissingnessRecognizer`	`loclean.recognize_missingness()`	Identifies MNAR patterns and encodes as boolean `{col}_mnar` feature flags
`TargetLeakageAuditor`	`loclean.audit_leakage()`	Semantic timeline evaluation to detect features that leak the target variable

Design

Backend agnostic — all DataFrame operations use Narwhals
Sandbox execution — LLM-generated code compiled via compile_sandboxed
Semantic agnosticism — TrapPruner anonymises column names before LLM evaluation
Graceful degradation — all modules keep data intact if LLM fails

Changes

src/loclean/extraction/trap_pruner.py [NEW]
src/loclean/extraction/missingness_recognizer.py [NEW]
src/loclean/extraction/leakage_auditor.py [NEW]
src/loclean/extraction/__init__.py — lazy imports
src/loclean/__init__.py — Loclean class methods + module-level functions + __all__

Tests

39 unit tests (13 per module) covering profiling, prompt construction, verdict parsing, and mock-LLM integration.

uv run pytest tests/unit/extraction/test_trap_pruner.py tests/unit/extraction/test_missingness_recognizer.py tests/unit/extraction/test_leakage_auditor.py -v --no-cov

…eAuditor - TrapPruner: statistical profiling + LLM verification of Gaussian noise columns - MissingnessRecognizer: MNAR pattern detection with sandbox-compiled encoders - TargetLeakageAuditor: semantic timeline evaluation for target leakage

…o public API - Add all three to extraction/__init__.py lazy imports - Add Loclean class methods + module-level convenience functions - Update __all__ in loclean/__init__.py

…r, TargetLeakageAuditor - 13 tests each (39 total) covering profiling, prompt construction, verdict parsing, verification, and mock-LLM integration

devactivity-app · 2026-03-03T09:07:55Z

Pull Request Summary by devActivity

Metrics

Achievements

@nxank4
Earned XP: 10⭐
Sign up here to check your progress

nxank4 force-pushed the feat/extraction-auditors branch 2 times, most recently from 69a140a to 2ccb620 Compare February 27, 2026 17:39

nxank4 added 3 commits February 27, 2026 17:43

feat(api): wire prune_traps, recognize_missingness, audit_leakage int…

f5bd6f9

…o public API - Add all three to extraction/__init__.py lazy imports - Add Loclean class methods + module-level convenience functions - Update __all__ in loclean/__init__.py

test(extraction): add unit tests for TrapPruner, MissingnessRecognize…

67c5e49

…r, TargetLeakageAuditor - 13 tests each (39 total) covering profiling, prompt construction, verdict parsing, verification, and mock-LLM integration

nxank4 force-pushed the feat/extraction-auditors branch from 2ccb620 to 67c5e49 Compare February 27, 2026 17:44

nxank4 merged commit 900d6a0 into main Feb 27, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(extraction): add TrapPruner, MissingnessRecognizer, TargetLeakageAuditor#70

feat(extraction): add TrapPruner, MissingnessRecognizer, TargetLeakageAuditor#70
nxank4 merged 3 commits intomainfrom
feat/extraction-auditors

nxank4 commented Feb 26, 2026

Uh oh!

Uh oh!

devactivity-app bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nxank4 commented Feb 26, 2026

Summary

New Modules

Design

Changes

Tests

Uh oh!

Uh oh!

devactivity-app bot commented Mar 3, 2026

Pull Request Summary by devActivity

Metrics

Achievements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant