Skip to content

New Feature - API Framework#6

Merged
BenjaminIsaac0111 merged 5 commits intomainfrom
Feature-API-Design
Mar 11, 2026
Merged

New Feature - API Framework#6
BenjaminIsaac0111 merged 5 commits intomainfrom
Feature-API-Design

Conversation

@BenjaminIsaac0111
Copy link
Owner

I have introduced and refactored the package, adding a shiny new API for others to write their own data recipes and run inference on trained models.

This is the first version of this, so more features may be added or removed break until start thinking more about a serious production pipeline for this codebase.

The HEST1k dataset now exists more as a way for others to see what type of data the models need to train. In a sense, this is just a basic dataloader recipe, serving as a reference and a benchmark for new model architectures I may come up with in the future!

The documentation has been updated with these new API additions and should help those who want to start training their own models.

…ove to seperate concerns and disentangle the main project from the HEST1k data, treating HEST1k more as a developmental dataset for the project.
…d Data Loader class template.

This commit refactors the codebase from an experiment-focused structure into a general-purpose framework in order to encourage users to develope there own models and pipelines. Key changes include the definition of a clear data contract, the introduction of a high-level Trainer class, and the isolation of HEST-specific logic into a recipe.

Core Changes:

- Introduced SpatialDataset abstract base class to define a standard data contract (features, gene_counts, rel_coords) for all spatial transcriptomics datasets.
- Implemented a high-level Trainer class to orchestrate the training lifecycle, including LR scheduling (warmup + cosine), AMP, and checkpointing.
- Added a flexible callback system to the Trainer, including a built-in EarlyStoppingCallback.
- Created a recipes/hest namespace to isolate HEST-specific dataset logic and utilities, maintaining backward compatibility through re-export facades.
- Added a "Bring Your Own Data" (BYOD) guide and template for custom datasets.
API & DX:

- Exposed Trainer and SpatialDataset in the top-level package for easier access.
- Standardized training engine functions (train_one_epoch, validate) to be agnostic to specific data sources.
- Comprehensive unit tests added for the Trainer lifecycle, callbacks, and resumption.
- Updated documentation (API.md) with detailed Training API and BYOD sections.

Verified with 166 passing tests across the full suite.
- Add code-level attribution in backbones.py for the foundation models.
@BenjaminIsaac0111 BenjaminIsaac0111 self-assigned this Mar 11, 2026
@BenjaminIsaac0111 BenjaminIsaac0111 merged commit d6cd678 into main Mar 11, 2026
2 checks passed
@BenjaminIsaac0111 BenjaminIsaac0111 deleted the Feature-API-Design branch March 11, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant