Skip to content

feat: execution sandbox, compilation cache, and unified client refactor#69

Merged
nxank4 merged 5 commits intomainfrom
feat/generative-compilation-and-instruction-optimizer
Feb 26, 2026
Merged

feat: execution sandbox, compilation cache, and unified client refactor#69
nxank4 merged 5 commits intomainfrom
feat/generative-compilation-and-instruction-optimizer

Conversation

@nxank4
Copy link
Copy Markdown
Collaborator

@nxank4 nxank4 commented Feb 25, 2026

Summary

Implements strict execution security, compilation caching, and architectural improvements for the dynamic code generation modules.

Execution Sandbox

  • compile_sandboxed() replaces raw exec with restricted __builtins__ (blocks open, exec, eval, __import__)
  • run_with_timeout() enforces 2s wall-clock timeout per function call via ThreadPoolExecutor
  • Integrated into both FeatureDiscovery and RelationalShredder

Compilation Cache

  • Extended LocleanCache with code_cache table and get_code/set_code methods
  • SHA256-based key generation from structural metadata (columns + dtypes + target_col + module_prefix)
  • Cache-first flow: hit → compile + verify + apply, miss → standard LLM flow → store on success

Graceful Fallback

  • Exhausted retries now log warning and return unmodified DataFrame / empty dict instead of RuntimeError

Unified Client Refactor

  • Lazy __getattr__ imports in extraction/__init__.py
  • _resolve_engine() helper replaces 9 duplicate engine-creation blocks
  • 6 new Loclean wrapper methods (clean, resolve_entities, oversample, etc.)

New Modules

  • EntityResolver: LLM-driven entity resolution
  • Oversampler: SMOTE-based minority class oversampling
  • PipelineOrchestrator: configurable multi-step cleaning pipeline
  • QualityGate: statistical data quality validation

Testing

  • All quality gates pass: ruff format, ruff check, mypy
  • All unit tests pass (625+) with zero regressions

… enforcement

- Add compile_sandboxed() for namespace-restricted exec (blocks open, exec, eval, __import__)
- Add run_with_timeout() using ThreadPoolExecutor for wall-clock timeout (default 2s)
- Integrate sandbox into FeatureDiscovery._compile_function and _verify_function
- Integrate sandbox into RelationalShredder._compile_function and _verify_function
- Add timeout_s constructor parameter to both modules
- Add 15 sandbox unit tests and update existing module tests
…ceful fallback

- Extend LocleanCache with code_cache table and get_code/set_code methods
- Add compute_code_key() for deterministic SHA256 hash from structural metadata
- Integrate cache hit/miss into FeatureDiscovery.discover() and RelationalShredder.shred()
- Replace RuntimeError with graceful fallback on exhausted retries
  - discover() returns unmodified DataFrame
  - shred() returns empty dict
- Add 12 cache-specific tests covering key generation, roundtrip, and fallback
…d wrapper methods

- Switch extraction/__init__.py to lazy __getattr__ imports for advanced modules
- Add _resolve_engine() helper replacing 9 duplicate engine-creation blocks
- Add 6 wrapper methods to Loclean class (clean, resolve_entities, oversample,
  shred_to_relations, discover_features, validate_quality)
- Add 9 unit tests for wrapper methods and engine resolution
- EntityResolver: LLM-driven entity resolution with fuzzy matching
- Oversampler: SMOTE-based minority class oversampling via LLM
- Add comprehensive unit tests for both modules
@nxank4 nxank4 force-pushed the feat/generative-compilation-and-instruction-optimizer branch 2 times, most recently from d870207 to 52690ec Compare February 25, 2026 18:42
- PipelineOrchestrator: configurable multi-step data cleaning pipeline
- QualityGate: statistical data quality validation with threshold checks
- Add comprehensive unit tests for both modules
@nxank4 nxank4 force-pushed the feat/generative-compilation-and-instruction-optimizer branch from 52690ec to 4bc7924 Compare February 25, 2026 18:45
@nxank4 nxank4 merged commit c932198 into main Feb 26, 2026
21 checks passed
@nxank4 nxank4 deleted the feat/generative-compilation-and-instruction-optimizer branch February 26, 2026 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant