Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 2 additions & 24 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,30 +62,8 @@ jobs:
- name: Check distributions
run: python -m twine check dist/*

- name: Smoke install built wheel
shell: bash
run: |
set -euo pipefail
WHEEL="$(ls dist/*.whl | head -n 1)"
python -m venv .smoke_venv
.smoke_venv/bin/python -m pip install --upgrade pip
.smoke_venv/bin/python -m pip install "$WHEEL"
mkdir -p .smoke_outside_checkout
cd .smoke_outside_checkout
../.smoke_venv/bin/melite --version
../.smoke_venv/bin/python -c "
import melite
expected = ['Config', 'load_datasets', 'plot_cv_distributions', 'predict', '__version__']
assert melite.__all__ == expected, melite.__all__
for name in expected:
assert hasattr(melite, name), f'{name} missing'
assert 'load_dataset' not in melite.__all__, 'load_dataset must not be top-level public API'
assert 'ResultManager' not in melite.__all__, 'ResultManager must not be top-level public API'
assert not hasattr(melite, 'Pipeline'), 'Pipeline must not be public'
from melite.result_manager import ResultManager
assert ResultManager is not None, 'ResultManager internal import missing'
print(melite.__version__, 'wheel OK')
"
- name: Smoke installed wheel toy workflow
run: python scripts/smoke_install_wheel.py

- name: Smoke install sdist
shell: bash
Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

---

## [v0.2.1] - 2026-05-27

### Changed
- `[models].active` / `ACTIVE_MODELS` now controls which model families are
trained during benchmarking.
- `melite export` now uses strict dataset loading for registry-based datasets
and no longer falls back to `arr.files[0]` when an `.npz` file lacks `X`.
- Added an installed-wheel smoke test that builds the wheel, installs it
outside the repository checkout, runs a toy `[datasets.toy]` smoke benchmark,
exports row 0 non-interactively, and verifies generated artifacts.

### Compatibility
- The top-level public API remains unchanged:
`Config`, `load_datasets`, `plot_cv_distributions`, `predict`, and
`__version__`.
- Legacy `reduction_type` + `level` export rows remain supported, but
individual legacy `.npz` files must now contain an explicit `X` array.

---

## [v0.2.0] - 2026-05-26

### Added
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ cff-version: 1.2.0
message: "If you use this software, please cite it as below."
type: software
title: "MELITE: Multi-model Evaluation and Learning for Inference-ready Tabular Experiments"
version: "0.2.0"
date-released: "2026-05-26"
version: "0.2.1"
date-released: "2026-05-27"
authors:
- family-names: "Contreras-Torres"
given-names: "Flavio F."
Expand Down
23 changes: 17 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![CI](https://github.com/NanoBiostructuresRG/melite/actions/workflows/ci.yml/badge.svg)](https://github.com/NanoBiostructuresRG/melite/actions/workflows/ci.yml)
[![License: LGPL v3](https://img.shields.io/badge/License-LGPL_v3-blue.svg)](LICENSE)
[![Version](https://img.shields.io/badge/version-v0.1.11-blue.svg)]()
[![Version](https://img.shields.io/badge/version-v0.2.1-blue.svg)]()
[![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-blue)]()

**MELITE** is a pre-stable Python toolkit for tabular classification
Expand All @@ -21,7 +21,7 @@ Project: MELITE
PyPI distribution: melite
Import package: melite
CLI: melite
Version: 0.2.0
Version: 0.2.1
License: LGPL-3.0-or-later
Status: alpha / pre-stable
```
Expand Down Expand Up @@ -133,6 +133,16 @@ Registered datasets are loaded strictly: missing files, missing `X`, non-2D or
non-numeric `X`, length mismatches, and embedded `y` mismatches fail the run.
Legacy `[benchmark].reduction_types` and `levels` configs are still accepted
and are normalized into equivalent dataset entries such as `PCA70` and `UMAP90`.

Model families are controlled by `[models].active`:

```toml
[models]
active = ["svc", "rf", "xgb"]
```

Remove a key to skip that family during training. Valid keys are `svc`, `rf`,
and `xgb`.

## CLI

Expand Down Expand Up @@ -165,7 +175,7 @@ from melite import __version__
```

Modules not listed above are importable directly but are not part of the public
contract and may change before 0.2.0.
contract and may change before 1.0.

## Input Format

Expand Down Expand Up @@ -196,13 +206,14 @@ Local inputs and generated artifacts such as `raw/`, `data/`, `output/`,

## Validation

The current `dev/v0.2.0` branch targets:
The current `dev/v0.2.1` branch targets:

```bash
python -m pytest tests/ -v --basetemp=.review_pytest_tmp -o cache_dir=.review_pytest_cache
mkdocs build --strict
python -m build
python -m build --no-isolation
python -m twine check dist/*
python scripts/smoke_install_wheel.py
melite --help
melite run --help
melite export --help
Expand All @@ -216,7 +227,7 @@ If you use MELITE in your research, please cite it using the metadata in

```text
Contreras-Torres, F. F., & Murrieta, A. C. (2026). MELITE: Multi-model
Evaluation and Learning for Inference-ready Tabular Experiments (0.1.11).
Evaluation and Learning for Inference-ready Tabular Experiments (0.2.1).
Tecnologico de Monterrey. https://github.com/NanoBiostructuresRG/melite
```

Expand Down
2 changes: 1 addition & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# API Reference

MELITE exposes an intended public API through five symbols. The project is
pre-stable, so this API may change before 0.2.0. Internal modules are importable
pre-stable, so this API may change before 1.0. Internal modules are importable
directly but are not part of the public contract.

```python
Expand Down
86 changes: 63 additions & 23 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,17 @@
</div>
<div class="ms-badges" aria-label="Project badges">
<img alt="CI" src="https://github.com/NanoBiostructuresRG/melite/actions/workflows/ci.yml/badge.svg">
<img alt="Version" src="https://img.shields.io/badge/version-v0.1.11-blue.svg">
<img alt="Version" src="https://img.shields.io/badge/version-v0.2.1-blue.svg">
<img alt="Python versions" src="https://img.shields.io/badge/python-3.11%20%7C%203.12-blue">
<img alt="License: LGPL v3+" src="https://img.shields.io/badge/License-LGPL_v3%2B-blue.svg">
</div>
</div>
</section>

!!! note "Pre-stable"
MELITE is currently in alpha-stage development (`v0.1.x`). Publication on
MELITE is currently in alpha-stage development (`v0.2.x`). Publication on
PyPI is prepared under the package name `melite`. Public APIs may
change before 0.2.0.
change before 1.0.

## Workflow

Expand Down Expand Up @@ -118,32 +118,72 @@ industrial features, or manually selected numeric features.
MELITE uses a dataset registry under `[datasets.<dataset_id>]`. Each
`dataset_id` names one concrete numeric `X` matrix candidate.

```toml
[datasets.morgan_r2_2048]
path = "data/morgan_r2_2048.npz"
label_path = "raw/labels.npy"
family = "fingerprints"
method = "Morgan"

[datasets.rdkit_descriptors]
path = "data/rdkit_descriptors.npz"
label_path = "raw/labels.npy"
family = "descriptors"
method = "RDKit"

[datasets.pca85]
path = "data/PCA85.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "PCA"
level = 85
```
<section class="ms-dataset-panel" aria-label="Dataset registry examples">
<div class="ms-dataset-panel__intro">
<span class="ms-dataset-panel__kicker">Registry pattern</span>
<strong>One dataset id, one numeric matrix.</strong>
<p>Use metadata for reporting and traceability; execution follows the
registered files, not hardcoded dataset families.</p>
</div>
</section>

=== "Fingerprints"

```toml
[datasets.morgan_r2_2048]
path = "data/morgan_r2_2048.npz"
label_path = "raw/labels.npy"
family = "fingerprints"
method = "Morgan"
variant = "radius2_2048"
```

`morgan_r2_2048` is just a user-defined id. MELITE treats it as a concrete
feature matrix candidate and reports the metadata with its results.

=== "Descriptors"

```toml
[datasets.rdkit_descriptors]
path = "data/rdkit_descriptors.npz"
label_path = "raw/labels.npy"
family = "descriptors"
method = "RDKit"
description = "Curated numeric descriptor table"
```

Descriptor tables follow the same strict contract: numeric, two-dimensional
`X`, plus a label vector loaded from `label_path`.

=== "Dimensionality"

```toml
[datasets.pca85]
path = "data/PCA85.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "PCA"
level = 85

[datasets.umap90]
path = "data/UMAP90.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "UMAP"
level = 90
```

PCA and UMAP are ordinary dataset entries. `method` and `level` preserve
legacy reporting context without driving special execution logic.

Required fields are `path` and `label_path`; optional metadata fields are
`family`, `method`, `variant`, `level`, and `description`. Legacy
`[benchmark].reduction_types` and `levels` configs are still normalized into
dataset entries when `[datasets]` is absent.

Each `.npz` dataset must contain an explicit `X` array; missing `X` fails
strict dataset loading.

## Quick Example

```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,5 @@ melite --version
Expected version for this release:

```text
MELITE 0.1.11
MELITE 0.2.1
```
39 changes: 15 additions & 24 deletions docs/release.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,19 @@
# Release Notes

MELITE `0.2.0` introduces the generalized tabular dataset registry and keeps
legacy PCA/UMAP configuration compatibility.
MELITE `0.2.1` hardens the generalized tabular dataset workflow while
preserving the top-level public API.

## 0.2.0 Highlights
## 0.2.1 Highlights

- Registers concrete tabular matrices under `[datasets.<dataset_id>]`.
- Requires `path` and `label_path`; preserves optional metadata fields
`family`, `method`, `variant`, `level`, and `description`.
- Runs benchmarks through strict `cfg.DATASETS` loading.
- Exports dataset-based artifacts such as `Model_SVC_morgan_r2_2048.pkl`.
- Falls back to legacy `reduction_type` + `level` export rows for older CSVs.

## 0.1.11 Highlights

MELITE `0.1.11` prepared the project documentation and package metadata for
the first PyPI publication as `melite`.

- Uses final release metadata version `0.1.11`.
- Clarifies that MELITE is tabular at the modeling level and consumes numeric
`X` and `y` arrays.
- Documented generalized `[datasets.*]` definitions as a future direction at
that time.
- Does not change functional training, selection, export, prediction, or CLI
behavior.
- `[models].active` controls which model families are trained.
- Export uses strict dataset loading and requires explicit `X` in individual
`.npz` files.
- Installed-wheel smoke validation runs and exports a toy `[datasets.toy]`
workflow outside the repository checkout.
- The public API remains `Config`, `load_datasets`, `plot_cv_distributions`,
`predict`, and `__version__`.
- Legacy `reduction_type` + `level` export rows remain supported, but
individual legacy `.npz` files must contain an explicit `X` array.

## Validation Targets

Expand All @@ -32,8 +22,9 @@ Before release, validate:
```bash
mkdocs build --strict
python -m pytest tests/ -v --basetemp=.review_pytest_tmp -o cache_dir=.review_pytest_cache
python -m build
python -m build --no-isolation
python -m twine check dist/*
python scripts/smoke_install_wheel.py
melite --help
melite run --help
melite export --help
Expand All @@ -42,6 +33,6 @@ melite --version

## Full Changelog

The complete version history is maintained in the repository changelog:
The complete release history is maintained in the repository changelog:

--8<-- "CHANGELOG.md"
Loading
Loading