Dev/v0.2.3. Add experimental stacking classifier workflow by NanoBiostructuresRG · Pull Request #15 · NanoBiostructuresRG/melite

NanoBiostructuresRG · 2026-06-01T18:07:47Z

Summary

This PR adds the MELITE v0.2.3 experimental stacking workflow.

Main changes:

Adds an opt-in stack model key based on sklearn StackingClassifier.
Keeps F1 macro as the main scoring and model-selection metric.
Keeps standalone SVC behavior unchanged as StandardScaler -> SVC.
Uses stack_method="predict_proba" for stacking.
Uses a stacked SVC base estimator with StandardScaler -> SVC(probability=True).
Keeps Random Forest and XGBoost unscaled because they are tree-based models.
Uses LogisticRegression as the initial final estimator.
Preserves .pkl export through joblib.
Adds export and prediction support for stacking models.
Documents the stacking workflow and its internal CV behavior.
Adds .tmp_stack_smoke/ to .gitignore for local smoke-test outputs.

Notes

The stacking-internal CV uses the configured split count and random state without repeated splits because sklearn stacking builds out-of-fold meta-features with cross_val_predict. This ensures each training sample contributes exactly one out-of-fold prediction for training the final estimator.

The outer MELITE grid search and reporting workflow still uses the existing repeated CV/F1 evaluation.

Optuna, MLflow, model registry behavior, and .pkl replacement are intentionally out of scope.

Validation

Unit and integration validation:

python -m pytest -p no:cacheprovider tests\test_model_training.py tests\test_export.py tests\test_predict.py tests\test_version.py
- 47 passed

Full test suite:

python -m pytest -p no:cacheprovider
- 125 passed, 1 warning

Version checks:

melite --version
- MELITE 0.2.3
python -m melite.cli --version
- MELITE 0.2.3
python -c "import melite; print(melite.__version__)"
- 0.2.3

Real stack smoke test:

Temporary TOML copied from examples/example_config.toml.
Changed [models].active to ["stack"].
Changed output path to .tmp_stack_smoke/output/.
Ran python -m melite.cli run --smoke --config .tmp_stack_smoke\stack_smoke.toml.
Generated results.csv with model_name = StackingClassifier.
Exported with:

python -m melite.cli export --config .tmp_stack_smoke\stack_smoke.toml --csv .tmp_stack_smoke\output\results.csv --outdir .tmp_stack_smoke\output --row 0 --force
Exported model:

.tmp_stack_smoke/output/Model_StackingClassifier_sample_pca70.pkl
Loaded exported .pkl successfully as:

sklearn.ensemble._stacking.StackingClassifier
Confirmed:
- predict: True
- predict_proba: True
- prediction output example: [1 1 1 1 1]
- probability output shape: (5, 2)

Other checks:

git diff --check
- passed, with only Windows LF-to-CRLF warnings
Final branch status before push:
- clean

NanoBiostructuresRG added 4 commits June 1, 2026 10:07

feat: add experimental stacking classifier

97d9fce

docs: document v0.2.3 stacking workflow

b6ac1ed

docs: clarify stacking internal cross-validation

ad5f3cb

chore: ignore local stack smoke outputs

dbf84b3

NanoBiostructuresRG merged commit 9c11ebd into main Jun 1, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/v0.2.3. Add experimental stacking classifier workflow#15

Dev/v0.2.3. Add experimental stacking classifier workflow#15
NanoBiostructuresRG merged 4 commits into
mainfrom
dev/v0.2.3

NanoBiostructuresRG commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NanoBiostructuresRG commented Jun 1, 2026

Summary

Notes

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant