Merged
Conversation
…ove to seperate concerns and disentangle the main project from the HEST1k data, treating HEST1k more as a developmental dataset for the project.
…d Data Loader class template. This commit refactors the codebase from an experiment-focused structure into a general-purpose framework in order to encourage users to develope there own models and pipelines. Key changes include the definition of a clear data contract, the introduction of a high-level Trainer class, and the isolation of HEST-specific logic into a recipe. Core Changes: - Introduced SpatialDataset abstract base class to define a standard data contract (features, gene_counts, rel_coords) for all spatial transcriptomics datasets. - Implemented a high-level Trainer class to orchestrate the training lifecycle, including LR scheduling (warmup + cosine), AMP, and checkpointing. - Added a flexible callback system to the Trainer, including a built-in EarlyStoppingCallback. - Created a recipes/hest namespace to isolate HEST-specific dataset logic and utilities, maintaining backward compatibility through re-export facades. - Added a "Bring Your Own Data" (BYOD) guide and template for custom datasets. API & DX: - Exposed Trainer and SpatialDataset in the top-level package for easier access. - Standardized training engine functions (train_one_epoch, validate) to be agnostic to specific data sources. - Comprehensive unit tests added for the Trainer lifecycle, callbacks, and resumption. - Updated documentation (API.md) with detailed Training API and BYOD sections. Verified with 166 passing tests across the full suite.
- Add code-level attribution in backbones.py for the foundation models.
…r). Resolves the formatting failures.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I have introduced and refactored the package, adding a shiny new API for others to write their own data recipes and run inference on trained models.
This is the first version of this, so more features may be added or removed break until start thinking more about a serious production pipeline for this codebase.
The HEST1k dataset now exists more as a way for others to see what type of data the models need to train. In a sense, this is just a basic dataloader recipe, serving as a reference and a benchmark for new model architectures I may come up with in the future!
The documentation has been updated with these new API additions and should help those who want to start training their own models.