Long-read transcriptome assembly reveals vast transcriptional complexity in the placenta associated with metabolic and endocrine function
This repository contains the complete codebase and documentation for the analyses presented in our manuscript:
Long-read transcriptome assembly reveals vast transcriptional complexity in the placenta associated with metabolic and endocrine function
Sean T. Bresnahan, Hannah E. J. Yong, William H. Wu, Sierra Lopez, Jerry Kok Yen Chan, Frédérique White, Pierre-Étienne Jacques, Marie-France Hivert, Shiao-Yng Chan, Michael I. Love, Jonathan Y. Huang, Arjun Bhattacharya
bioRxiv, 2025.06.26.661362
https://doi.org/10.1101/2025.06.26.661362
The pipeline integrates long-read sequencing, orthogonal validation datasets, and short-read-based transcriptomics to characterize placental transcript diversity and its relevance to metabolic and endocrine function in pregnancy.
To visualize transcript structures and explore GDM association results, see: lr-placenta-transcriptome-viz, a Shiny app for visualizing transcript structures and summary statistics for GDM and birth weight effects mediated via long-read placental transcriptomics.
📄 GTF of all transcripts assembled by ESPRESSO
📄 SQANTI3 classification table of all transcripts assembled by ESPRESSO
📄 GTF of filtered, high-confidence transcripts
📄 SQANTI3 classification table of filtered, high-confidence transcripts
📄 I_dataset_preparation.md
Prepares CAGE-seq, DNase-seq, poly(A) site, and short-read splice junction data to validate long-read transcript structures.
📄 II_assembly.md
Processes Oxford Nanopore data through basecalling, trimming, correction, alignment, and de novo transcriptome assembly using ESPRESSO.
-
📜
IIIB_raw_assembly_analysis.R
Compares assembly to GENCODE v45 and GTEx v9 across gene/isoform counts and structural categories. -
📜
IIIC_assembly_QC_and_filtering.R
Applies SQANTI3-based multi-layered filtering with CAGE-seq, poly(A) support, and splice junction evidence. Outputs filtered transcriptome and QC plots.
📄 IV_short_read_analyses.md
Explores the biological and clinical relevance of the filtered assembly using RNA-seq data from GUSTO (N=200) and Gen3G (N=152).
-
📜
IVB_short-read_txomics.R
Assesses quantification accuracy, expression patterns and coding potential vs GENCODE & GENCODE+. -
📜
IVC_DGE_DTE.R
Identifies differentially expressed genes and transcripts associated with gestational diabetes mellitus (GDM). -
📜
IVD_mediation_and_GOEA.R
Tests for mediation of GDM effects on birth weight by transcript expression, and performs GO enrichment analysis.
- Transcriptome Annotation: Filtered GTF, FASTA, classification tables
- Figures: All primary and supplementary figures (Figures 1–5, S1–S8)
- Expression Data: Gene and transcript-level quantifications for GUSTO and Gen3G
- Mediation & Enrichment Results: Individual and PC-based mediators, enriched GO terms
For questions, please contact:
📧 Sean T Bresnahan — stbresnahan@mdanderson.org
🧬 Bhattacharya Lab for Computational Genomics