New Feature - API Framework by BenjaminIsaac0111 · Pull Request #6 · BenjaminIsaac0111/SpatialTranscriptFormer

BenjaminIsaac0111 · 2026-03-11T18:52:53Z

I have introduced and refactored the package, adding a shiny new API for others to write their own data recipes and run inference on trained models.

This is the first version of this, so more features may be added or removed break until start thinking more about a serious production pipeline for this codebase.

The HEST1k dataset now exists more as a way for others to see what type of data the models need to train. In a sense, this is just a basic dataloader recipe, serving as a reference and a benchmark for new model architectures I may come up with in the future!

The documentation has been updated with these new API additions and should help those who want to start training their own models.

…ove to seperate concerns and disentangle the main project from the HEST1k data, treating HEST1k more as a developmental dataset for the project.

…d Data Loader class template. This commit refactors the codebase from an experiment-focused structure into a general-purpose framework in order to encourage users to develope there own models and pipelines. Key changes include the definition of a clear data contract, the introduction of a high-level Trainer class, and the isolation of HEST-specific logic into a recipe. Core Changes: - Introduced SpatialDataset abstract base class to define a standard data contract (features, gene_counts, rel_coords) for all spatial transcriptomics datasets. - Implemented a high-level Trainer class to orchestrate the training lifecycle, including LR scheduling (warmup + cosine), AMP, and checkpointing. - Added a flexible callback system to the Trainer, including a built-in EarlyStoppingCallback. - Created a recipes/hest namespace to isolate HEST-specific dataset logic and utilities, maintaining backward compatibility through re-export facades. - Added a "Bring Your Own Data" (BYOD) guide and template for custom datasets. API & DX: - Exposed Trainer and SpatialDataset in the top-level package for easier access. - Standardized training engine functions (train_one_epoch, validate) to be agnostic to specific data sources. - Comprehensive unit tests added for the Trainer lifecycle, callbacks, and resumption. - Updated documentation (API.md) with detailed Training API and BYOD sections. Verified with 166 passing tests across the full suite.

- Add code-level attribution in backbones.py for the foundation models.

…ecipies.

…r). Resolves the formatting failures.

BenjaminIsaac0111 added 5 commits March 5, 2026 15:39

- Introducing initial inference API. Subject to change, will likley m…

ad7e6dd

…ove to seperate concerns and disentangle the main project from the HEST1k data, treating HEST1k more as a developmental dataset for the project.

docs: clarify licensing and add third-party attributions

5cb9674

- Add code-level attribution in backbones.py for the foundation models.

- Minor update to the docs for clarification on the API and Dataset R…

c59bd58

…ecipies.

- reformatted the 6 files identified ifor CI/CD check (black formatte…

bf48327

…r). Resolves the formatting failures.

BenjaminIsaac0111 self-assigned this Mar 11, 2026

BenjaminIsaac0111 merged commit d6cd678 into main Mar 11, 2026
2 checks passed

BenjaminIsaac0111 deleted the Feature-API-Design branch March 11, 2026 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Feature - API Framework#6

New Feature - API Framework#6
BenjaminIsaac0111 merged 5 commits intomainfrom
Feature-API-Design

BenjaminIsaac0111 commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BenjaminIsaac0111 commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant