-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
Overview
During a deep dive into the evaluation pipeline of Versor,
I identified and resolved data leakage issues in the QM9 and 20 Newsgroups benchmarks. While the core Geometric Blade Network (GBN) architecture remains robust, these fixes ensure that the reported performance metrics are derived from a strictly isolated and deterministic evaluation protocol.
Detailed Fixes
- QM9: Deterministic Splitting & Statistics IsolationIssue: The previous DataLoader implementation used a dynamic random seed for dataset shuffling. This caused potential overlap between training and validation sets across different epochs. Additionally, normalization statistics (mean/std) were inadvertently calculated across the entire dataset.
- Fix: Refactored datasets/qm9.py to get_qm9_loaders().Deterministic Split: Enforced a fixed seed (Seed 42) for a one-time, consistent split of Train/Val/Test sets.Proper Normalization: Statistics are now computed strictly on the training set and applied to Val/Test sets to prevent distribution leakage.
- 20 Newsgroups: PCA Feature Basis IsolationIssue: PCA fitting was being performed independently on both training and testing sets. In a rigorous ML pipeline, the feature basis (PCA components) must be learned only from the training distribution.
- Fix: Implemented a "Fit on Train, Transform All" strategy. The PCA model is fitted on training data, saved, and then loaded to transform the test set, ensuring consistent feature space mapping without using test-set information.
- HAR (Human Activity Recognition)Status: Verified as Correct. The implementation relies on pre-split train.csv and test.csv (split by subject), which adheres to industry standards.
Impact & Next StepsSemantic Task: Preliminary results show that even after fixing the leakage, the Geometric Disentanglement (
QM9 Task: Re-testing is currently underway with the new rigorous pipeline. Updated, "clean" benchmarks will be published soon.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels