scRNA-cross-donor-generalization

Random Cell-Level Splits Introduce Systematic Bias in scRNA-seq Cell Type Annotation

Overview

This project investigates how evaluation strategy affects the measured performance of scRNA-seq cell type classification models.

We compare:

Scheme A (Random cell-level split) – commonly used but biased
Scheme B (Donor-held-out) – more realistic cross-donor evaluation

Representations: HVG, PCA, Harmony, scVI

The goal is to quantify how data leakage and dataset structure impact reported model performance.

Project history

This repository began as a Spring 2026 Computational Genomics final project. The original course-project materials, including the proposal, presentation, report draft, notebooks, and results, are preserved in:

archive/

The active top-level repository is being actively developed as a cleaned, extended, multi-cohort benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
archive		archive
data		data
results		results
src/scrna_benchmark		src/scrna_benchmark
.gitignore		.gitignore
PBMC_Stephenson.ipynb		PBMC_Stephenson.ipynb
README.md		README.md
blood_atlas.ipynb		blood_atlas.ipynb
blood_atlas_data_preparation.ipynb		blood_atlas_data_preparation.ipynb
kidney.ipynb		kidney.ipynb
kidney_data_preparation.ipynb		kidney_data_preparation.ipynb
lung.ipynb		lung.ipynb
lung_data_preparation.ipynb		lung_data_preparation.ipynb
pancreas.ipynb		pancreas.ipynb
pancreas_data_preparation.ipynb		pancreas_data_preparation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scRNA-cross-donor-generalization

Overview

Project history

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scRNA-cross-donor-generalization

Overview

Project history

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages