Skip to content

Dev/v0.2.3. Add experimental stacking classifier workflow#15

Merged
NanoBiostructuresRG merged 4 commits into
mainfrom
dev/v0.2.3
Jun 1, 2026
Merged

Dev/v0.2.3. Add experimental stacking classifier workflow#15
NanoBiostructuresRG merged 4 commits into
mainfrom
dev/v0.2.3

Conversation

@NanoBiostructuresRG

Copy link
Copy Markdown
Owner

Summary

This PR adds the MELITE v0.2.3 experimental stacking workflow.

Main changes:

  • Adds an opt-in stack model key based on sklearn StackingClassifier.
  • Keeps F1 macro as the main scoring and model-selection metric.
  • Keeps standalone SVC behavior unchanged as StandardScaler -> SVC.
  • Uses stack_method="predict_proba" for stacking.
  • Uses a stacked SVC base estimator with StandardScaler -> SVC(probability=True).
  • Keeps Random Forest and XGBoost unscaled because they are tree-based models.
  • Uses LogisticRegression as the initial final estimator.
  • Preserves .pkl export through joblib.
  • Adds export and prediction support for stacking models.
  • Documents the stacking workflow and its internal CV behavior.
  • Adds .tmp_stack_smoke/ to .gitignore for local smoke-test outputs.

Notes

The stacking-internal CV uses the configured split count and random state without repeated splits because sklearn stacking builds out-of-fold meta-features with cross_val_predict. This ensures each training sample contributes exactly one out-of-fold prediction for training the final estimator.

The outer MELITE grid search and reporting workflow still uses the existing repeated CV/F1 evaluation.

Optuna, MLflow, model registry behavior, and .pkl replacement are intentionally out of scope.

Validation

Unit and integration validation:

  • python -m pytest -p no:cacheprovider tests\test_model_training.py tests\test_export.py tests\test_predict.py tests\test_version.py
    • 47 passed

Full test suite:

  • python -m pytest -p no:cacheprovider
    • 125 passed, 1 warning

Version checks:

  • melite --version
    • MELITE 0.2.3
  • python -m melite.cli --version
    • MELITE 0.2.3
  • python -c "import melite; print(melite.__version__)"
    • 0.2.3

Real stack smoke test:

  • Temporary TOML copied from examples/example_config.toml.

  • Changed [models].active to ["stack"].

  • Changed output path to .tmp_stack_smoke/output/.

  • Ran python -m melite.cli run --smoke --config .tmp_stack_smoke\stack_smoke.toml.

  • Generated results.csv with model_name = StackingClassifier.

  • Exported with:

    python -m melite.cli export --config .tmp_stack_smoke\stack_smoke.toml --csv .tmp_stack_smoke\output\results.csv --outdir .tmp_stack_smoke\output --row 0 --force

  • Exported model:

    .tmp_stack_smoke/output/Model_StackingClassifier_sample_pca70.pkl

  • Loaded exported .pkl successfully as:

    sklearn.ensemble._stacking.StackingClassifier

  • Confirmed:

    • predict: True
    • predict_proba: True
    • prediction output example: [1 1 1 1 1]
    • probability output shape: (5, 2)

Other checks:

  • git diff --check
    • passed, with only Windows LF-to-CRLF warnings
  • Final branch status before push:
    • clean

@NanoBiostructuresRG NanoBiostructuresRG merged commit 9c11ebd into main Jun 1, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant