Skip to content

AIxOmics/HUBSTA_Analysis

Repository files navigation

HUBSTA_Analysis

Human Unified Brain Spatial Transcriptomic Analysis β€” Human Fetal Brain Development Atlas Pipeline

A comprehensive spatial transcriptomics analysis pipeline for human fetal brain development, built on Stereo-seq (Spatial Enhanced Resolution Omics-sequencing) data. This repository covers the full workflow from cell segmentation to downstream biological interpretation, including deep learning-based embedding, GPU-accelerated clustering, region-specific DEG analysis, ligand-receptor interaction inference, spatial GRN inference, pseudotime analysis, and 3D visualization.


Overview of Brain Regions Analyzed

Region Directory Key Analyses
Hippocampus 05.hip_analysis/ DEG, Ligand-Receptor, Pseudotime, Tracks plots, Spatial gene plots
Thalamus 06.thalamus_analysis/ DEG, Ligand-Receptor, Spatial gene plots, Tangram mapping
Mid-Hindbrain (MHB) 07.mhb_analysis/ DEG, Ligand-Receptor, Tracks plots
Cerebellum 08.cerebellum_analysis/ DEG, Single-cell annotation, Spatial annotation, Pseudotime, Ligand-Receptor, Circos plots
Spatial GRN 09.grn_analysis/ GPU SpaGRN submodule, whole-brain and region-specific GRN notebooks, TF/regulon visualization
gsMap Analysis 10.gsmap_analysis/ Spatial genetic enrichment across bin100, cell-bin, single-cell, and pseudobulk representations
Cortex 12.3D_ply_plot/ 3D PLY mesh gene visualization

Pipeline Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  01. Cell Segmentation  β”‚  ← Stereo-seq raw data (.gem, .h5ad)
β”‚     (Stereopy + ONNX)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  02. FuseMap Integrationβ”‚  ← Spatial data integration & embedding
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  03. DMT-HI Latent      β”‚  ← Deep Manifold Transformation
β”‚     Representation      β”‚     with Hyperbolic embedding
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  04. RAPIDS GPU         β”‚  ← GPU-accelerated Leiden clustering
β”‚     Clustering          β”‚     on DMT embeddings
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  05-08. Region Analysis β”‚  ← Per-region downstream analyses
β”‚                         β”‚     Β· DEG / Gene expression plots
β”‚                         β”‚     Β· Ligand-Receptor (CCI) inference
β”‚                         β”‚     Β· Pseudotime trajectory
β”‚                         β”‚     Β· Spatial tracks visualization
β”‚                         β”‚     Β· Cell type annotation
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  09. Spatial GRN        β”‚  ← GPU SpaGRN inference and
β”‚     Analysis            β”‚     TF/regulon visualization
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  10. gsMap Analysis     β”‚  ← Spatial GWAS enrichment on
β”‚                         β”‚     bin100/cell-bin/SC/pseudobulk
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  12. 3D PLY Plot        β”‚  ← 3D mesh visualization of
β”‚     3D Visualization    β”‚     gene expression on brain models
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Repository Structure

HUBSTA_Analysis/
β”œβ”€β”€ 01.cell_segmentation/      # Cell segmentation (Stereopy + ONNX model)
β”‚   β”œβ”€β”€ 01.cell_segmentation.py
β”‚   β”œβ”€β”€ 02.cell_correct.py
β”‚   β”œβ”€β”€ 03.make_cell_mask.py
β”‚   └── cell_segmetation_v3.0.onnx
β”‚
β”œβ”€β”€ 02.Fusemap/                # FuseMap: spatial data integration framework
β”‚   β”œβ”€β”€ FuseMap/
β”‚   β”‚   β”œβ”€β”€ config.py          # Configuration
β”‚   β”‚   β”œβ”€β”€ dataset.py         # Data loading
β”‚   β”‚   β”œβ”€β”€ model.py           # Core model architecture
β”‚   β”‚   β”œβ”€β”€ train.py           # Training pipeline
β”‚   β”‚   β”œβ”€β”€ loss.py            # Loss functions
β”‚   β”‚   β”œβ”€β”€ preprocess.py      # Preprocessing
β”‚   β”‚   β”œβ”€β”€ run.*.py           # Region-specific run scripts
β”‚   β”‚   └── paper_code/        # Paper reproduction code
β”‚   β”‚       β”œβ”€β”€ benchmark/     # Benchmarks (cell integration, gene imputation)
β”‚   β”‚       β”œβ”€β”€ cell_cell_interation/
β”‚   β”‚       β”œβ”€β”€ reference_mapping/
β”‚   β”‚       β”œβ”€β”€ universal_cell_type/
β”‚   β”‚       β”œβ”€β”€ universal_gene_embedding/
β”‚   β”‚       └── universal_tissue_region/
β”‚
β”œβ”€β”€ 03.DMT-HI/                 # Deep Manifold Transformation (Hyperbolic)
β”‚   β”œβ”€β”€ main.py                # Main entry point
β”‚   β”œβ”€β”€ model/                 # Model definitions
β”‚   β”œβ”€β”€ conf_new/              # YAML config files
β”‚   β”œβ”€β”€ dataloader/            # Data loaders
β”‚   β”œβ”€β”€ manifolds/             # Hyperbolic manifold utilities
β”‚   β”œβ”€β”€ sweep/                 # Hyperparameter sweep configs
β”‚   └── run*.sh                # Run scripts
β”‚
β”œβ”€β”€ 04.rapids_cluster/         # GPU-accelerated clustering (RAPIDS)
β”‚   β”œβ”€β”€ *_single_1.py          # Single-cell embedding clustering
β”‚   β”œβ”€β”€ *_spatial_1.py         # Spatial embedding clustering
β”‚   └── run.*.sh               # Region-specific run scripts
β”‚
β”œβ”€β”€ 05.hip_analysis/           # Hippocampus analysis
β”‚   β”œβ”€β”€ for_deg_plot.py/R/sh   # DEG analysis
β”‚   β”œβ”€β”€ Ligand-receptor_interaction_inference.py
β”‚   β”œβ”€β”€ tracks_single.py / tracks_plot.py
β”‚   └── *.ipynb                # Analysis notebooks
β”‚
β”œβ”€β”€ 06.thalamus_analysis/      # Thalamus analysis
β”‚   β”œβ”€β”€ for_deg_plot.py/R/sh   # DEG analysis
β”‚   β”œβ”€β”€ Ligand-receptor_interaction_inference.py
β”‚   β”œβ”€β”€ tangram_yes.py         # Tangram mapping
β”‚   └── *.ipynb                # Analysis notebooks
β”‚
β”œβ”€β”€ 07.mhb_analysis/           # Mid-Hindbrain analysis
β”‚   β”œβ”€β”€ for_deg_plot.py/R/sh   # DEG analysis
β”‚   β”œβ”€β”€ Ligand-receptor_interaction_inference.py
β”‚   └── *.ipynb                # Analysis notebooks
β”‚
β”œβ”€β”€ 08.cerebellum_analysis/    # Cerebellum analysis
β”‚   β”œβ”€β”€ *read_process_deg.ipynb
β”‚   β”œβ”€β”€ *single_anno.py        # Single-cell annotation
β”‚   β”œβ”€β”€ *spatial_anno.py       # Spatial annotation
β”‚   β”œβ”€β”€ *tracks*.py            # Tracks visualization
β”‚   β”œβ”€β”€ *Ligand-receptor*.py   # CCI inference
β”‚   β”œβ”€β”€ *pseudotime_plot.ipynb # Pseudotime trajectory
β”‚   β”œβ”€β”€ *circosplot*.ipynb     # Circos plots
β”‚   └── *.ipynb
β”‚
β”œβ”€β”€ 09.grn_analysis/           # Spatial GRN analysis and GPU SpaGRN submodule
β”‚   β”œβ”€β”€ SpaGRN/                 # Git submodule: https://github.com/DBinary/SpaGRN
β”‚   β”œβ”€β”€ notebooks/              # Fetal brain GRN analysis notebooks
β”‚   β”œβ”€β”€ docs/                   # Notebook and output inventories
β”‚   β”œβ”€β”€ _3D_plot.py             # K3D 3D expression helper
β”‚   └── requirements.txt
β”‚
β”œβ”€β”€ 10.gsmap_analysis/         # gsMap spatial genetic enrichment analysis
β”‚   β”œβ”€β”€ bin100_3d/              # Whole-brain bin100 3D gsMap run
β”‚   β”œβ”€β”€ cell_bin/               # Cell-bin gsMap runs and Cauchy plots
β”‚   β”œβ”€β”€ single_cell/            # Single-cell gsMap run
β”‚   β”œβ”€β”€ pseudobulk/             # Pseudobulk construction and gsMap run
β”‚   β”œβ”€β”€ summary/                # Final comparison figures
β”‚   └── shared/                 # Shared paths and annotation helpers
β”‚
β”œβ”€β”€ 12.3D_ply_plot/            # 3D PLY mesh visualization
β”‚   β”œβ”€β”€ 2_mhb_gene_plot_*.py   # MHB 3D gene plot
β”‚   β”œβ”€β”€ 3_cortex_gene_plot_*.py # Cortex 3D gene plot
β”‚   β”œβ”€β”€ 04_brain_gene_plot_*.py # Whole brain 3D gene plot
β”‚   β”œβ”€β”€ 4_makemesh.ipynb       # Mesh generation
β”‚   └── 5_thalamus_gene_plot_*.py # Thalamus 3D gene plot
β”‚
└── README.md

Key Features

1. Cell Segmentation (01.cell_segmentation)

  • Stereo-seq cell segmentation using the Stereopy framework
  • ONNX-accelerated deep learning model (v3.0) for cell boundary detection
  • Cell mask generation and correction

2. FuseMap β€” Spatial Data Integration (02.Fusemap)

  • Universal gene embedding for spatial transcriptomics
  • Tissue region identification and harmonization
  • Reference mapping (MERFISH, STARmap PLUS, Slide-seq V2, Stereo-seq)
  • Cell-cell interaction analysis
  • Gene imputation and targeted gene panel selection
  • Benchmark suite for cell integration and gene imputation

3. DMT-HI β€” Deep Manifold Transformation (03.DMT-HI)

  • Hyperbolic embedding for spatial transcriptomics data
  • Configurable transformer and MLP backbones
  • YAML-based experiment configuration
  • Weights & Biases (wandb) integration for experiment tracking
  • Hyperparameter sweep support

4. GPU Clustering (04.rapids_cluster)

  • RAPIDS cuML accelerated Leiden clustering
  • Processes both single-cell and spatial embeddings
  • Covers: Cerebellum, Cortex (P10), Hippocampus, Mid-Hindbrain, Thalamus, Whole Brain

5. Downstream Analysis (per brain region)

Analysis Description
DEG Analysis Differential expression with Python + R (for_deg_plot.py/R/sh)
Ligand-Receptor (CCI) Cell-cell interaction inference between spatial regions
Pseudotime Trajectory analysis of cell lineages
Tracks Plot Spatial gene expression tracks across tissue slices
Cell Annotation Single-cell and spatial annotation of cell types
Tangram Mapping Spatial mapping of cell types
Circos Plots Circos-style visualization of cell-cell interactions

6. Spatial GRN Analysis (09.grn_analysis)

  • Fetal brain spatial gene regulatory network analysis notebooks
  • GPU-rewritten SpaGRN inference code included as a git submodule
  • Whole-brain and region-specific TF/regulon summaries
  • GRN pathway, heatmap, and 3D visualization helpers

7. 3D Visualization (12.3D_ply_plot)

  • 3D mesh (PLY format) rendering of brain regions
  • Gene expression mapped onto 3D brain surfaces
  • Supports Cortex, Thalamus, Mid-Hindbrain, and whole brain

Dependencies

The pipeline relies on several key Python packages:

  • Spatial Transcriptomics: stereopy, spateo
  • Single-cell / Spatial: scanpy, anndata, squidpy
  • GPU Clustering: rapids-singlecell (requires NVIDIA GPU + RAPIDS)
  • Deep Learning: pytorch, pytorch-lightning
  • FuseMap: See 02.Fusemap/FuseMap/fusemap_environment.yaml
  • DMT-HI: See 03.DMT-HI/requirements.txt and install_env.sh
  • Visualization: matplotlib, seaborn, plotly
  • GRN inference: spagrn, pyscenic, arboreto, omicverse, gseapy, k3d
  • 3D: open3d, trimesh

For detailed environment setup, refer to:

  • 01.cell_segmentation/readme.md β€” Stereopy installation
  • 02.Fusemap/FuseMap/fusemap_environment.yaml β€” FuseMap conda environment
  • 03.DMT-HI/readme.md β€” DMT-HI environment
  • 09.grn_analysis/requirements.txt β€” spatial GRN analysis dependencies

Quick Start

Step 1: Cell Segmentation

cd 01.cell_segmentation
python 01.cell_segmentation.py
python 02.cell_correct.py
python 03.make_cell_mask.py

Step 2: Run FuseMap Integration

cd 02.Fusemap/FuseMap
python run.thalamus.py   # or run.cerebellum.py / run.hip.py / run.mhb.py / run.ctx.py

Step 3: Train DMT-HI Embedding

cd 03.DMT-HI
python main.py fit -c=conf_new/transf_cond_mnist.yaml

Step 4: GPU Clustering

cd 04.rapids_cluster
bash run.thalamus.sh     # or run.cerebellum.sh / run.Hippocampus.sh etc.

Step 5: Region-specific Analysis

# Navigate to the desired brain region directory (e.g., Hippocampus)
cd 05.hip_analysis

# Run DEG analysis
bash for_deg_plot.sh

# Run ligand-receptor inference
python Ligand-receptor_interaction_inference.py

# Open notebooks for downstream visualization
jupyter lab 1.ipynb

Step 6: Spatial GRN Analysis

git submodule update --init --recursive
cd 09.grn_analysis
jupyter lab notebooks/01_bin100_preprocessing_grn/01_03_Bin100_GRN_Inference.ipynb

Step 7: 3D Visualization

cd 12.3D_ply_plot
python 04_brain_gene_plot_20260306.py  # Whole brain 3D gene expression

Data

The pipeline expects Stereo-seq data in .h5ad (AnnData) format. Each brain region has multiple tissue slices (identified by slice_code), each containing:

  • Spatial coordinates (obsm['align_spatial_2d'])
  • Gene expression matrix
  • DMT latent embeddings (obsm['X_dmt'], obsm['X_dmt_highdim'])
  • Leiden cluster assignments

The spatial GRN notebooks in 09.grn_analysis/ also expect GRN resources and intermediate AnnData outputs such as GRN_resource/, Process_Data/, Output/, and Figure/. Large data and generated outputs are intentionally excluded from the repository.


Citation

If you use this pipeline or any of its components in your research, please cite the relevant tools:


License

This project is for research purposes. See individual component directories for specific licenses (e.g., 02.Fusemap/FuseMap/LICENSE).


Author & Contact

For questions or collaboration inquiries, please open an issue or contact the repository maintainer.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors