Transcript-level RNA-seq data analysis using Salmon and edgeR v4

This repository contains scripts to reproduce the analysis in the following workflow paper:

Differential transcript expression and differential transcript usage using Salmon and edgeR v4

Xueyi Dong, Lizhong Chen, Junli Nie, Gordon K. Smyth, Yunshun Chen

Example RNA-seq data availability

The example data used in this protocol is the Illumina RNA-seq data from Dong et al, which is available from Gene Expression Omnibus (GEO) under accession number GSE172421.

Example analysis report / expected output

An HTML report generated from knitting workflow.Rmd is available in the docs folder. This report is also available for convenient viewing online at https://chenlaboratory.github.io/DTE_DTU_workflow/. This file also includes the elapsed time of each R-based stage and the R session information in the test run.

Overview of the repository

data: This folder is where any data required to run the workflow should be stored. RNA-seq reads should be saved into data/reads, while reference genome and gene annotation files should be downloaded into data/reference. Here we provided the sample information and experimental design spreadsheet (data/targets.txt) for the example data. Furthermore, we provide the R object (data/counts.RDS) containing the imported Salmon output of our test run to facilitate reproducing the exact results shown in the test run.
docs: This folder contains an example HTML report we generated by running this workflow.
setup: This folder contains scripts for the experimental setup, including downloading and preparing data files and installing required software packages.
workflow: This folder contains scripts for all the steps in this analysis workflow.
workflow.Rmd: This RMarkdown document organizes the workflow into sequential chunks, each sourcing the R script corresponding to a specific stage. By knitting this document, all stages in R (Stage 2-6) can be run, and an HTML report containing the results can be generated.
test_workflow.sh: This script can be used to run the whole workflow, including pulling the Docker container, downloading and preparing required data, running Salmon for quantification, and knitting the Rmd document. This script is written using SLURM syntax and requires the apptainer module.

System requirements

Operating system and hardware

This protocol is designed to be run on Linux operating system. We recommand at least 20GB of RAM and 8 CPU cores.

This protocol has been tested on Ubuntu 24.04.4 LTS operating system by running test_workflow.sh.

Software

Docker container image

We have provided a pre-configured Docker container environment containing all the necessary software tools and R packages. The image is publicly hosted on Docker Hub.

Dependent software

For users who prefer to maintain a native environment or wish to install the required software manually, detailed installation instructions can be found in the manuscript of the protocol.

The protocol is dependent on the following software:

SRA Toolkit software (version 3.1.0 or later)
Pigz software (version 2.8 or later)
GffRead software (version 0.12.7 or later)
Salmon software (version 1.10.0 or later)
R (version 4.5.2 or later)
R packages:
- edgeR version 4.8.2
- limma version 3.66.0
- rtracklayer version 1.70.1
- RColorBrewer version 1.1-3
- ggplot2 version 4.0.2
- Gviz version 1.54.0
- pheatmap version 1.0.13
- readr version 2.2.0
- jsonlite version 2.0.0

Typically, installing all the necessary software takes about 2 to 15 minutes, depending on your computer and network environment.

The protocol has been tested with SRA Toolkit version 3.1.0, GffRead version 0.12.7, Salmon version 1.10.0, R version 4.5.2. The R session information from our test run can be found in the "Session information" section of the expected output.

Instructions for running the workflow

Running the workflow on example data

All the scripts in this repository should be run from the project root directory.

To make sure the workflow can be reproduced, the users should follow the following order:

Clone this repository and navigate to the directory of the repository.
Install required software tools :

Option A: Install the required software manually

Follow the instructions in the "Equipment setup" section in the paper to install SRA Toolkit, pigz, GffRead, Salmon and R.
Run setup/install_R_packages.R in R to install required R packages.

Option B: Using our Docker container image

For Docker users, the image can be retrieved using the following command: docker pull xueyidong/dte_dtu_workflow:latest
For users operating on High-Performance Computing (HPC) clusters where Docker is unavailable, this image is also fully compatible with Apptainer. The image can be pulled and converted into a Singularity Image Format (.sif) file using the following command: apptainer pull dte_dtu_workflow.sif xueyidong/dte_dtu_workflow:latest

Download and prepare data:

Download human T2T-CHM13v2.0 reference genome sequence and annotation file into the directory data/reference. The script setup/down_annotation.sh can be used for downloading the reference data.
Download and prepare the example RNA-seq reads data. Run download_and_prepare_sra.sh to download the data and merge_tech_batch.sh to merge the technical replicates.

Run the scripts of each step of the workflow under the workflow folder in order. For steps 2-6, we recommend knitting the RMarkdown document workflow.Rmd to generate an HTML report.

Using the workflow on your data

When you want to use our workflow on your own data, we recommend the following:

Choose a suitable version of the reference genome and annotation for your data.
Prepare a target file to save your experimental design and sample information. The format can be found in this file: data/targets.txt.
Adjust the design matrix (stage 4, step 24) and contrasts (stage 5, step 29) according to your experimental design. We recommend the following article as a guide on how to set up your design and contrasts properly: Law et al., A guide to creating design matrices for gene expression experiments, F1000Research, 2020, DOI: 10.12688/f1000research.27893.1.

Notes

Due to the stochastic nature of Salmon’s quasi-mapping algorithm and Gibbs resampling, the result of each Salmon quantification run can be slightly different. This difference will impact downstream analysis such that different transcripts may be filtered out and a slightly different number of differential expression or usage transcripts may be detected.
To achieve strict reproducibility so that results exactly match our example run, or to test the workflow while skipping the time-consuming and computationally intensive data downloading and Salmon quantification steps, you may use our R object data/RDS/counts.RDS containing the imported Salmon output. The data can be used by replacing the command that imports Salmon output in Stage 3 (line 11, workflow/3_count_preprocess.R) counts <- catchSalmon(file.path(salmon_dir, samples)) by the R command to read in the R object: counts <- readRDS("data/RDS/counts.RDS").

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
docs		docs
setup		setup
workflow		workflow
LICENSE		LICENSE
README.md		README.md
test_full_workflow.sh		test_full_workflow.sh
workflow.Rmd		workflow.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcript-level RNA-seq data analysis using Salmon and edgeR v4

Example RNA-seq data availability

Example analysis report / expected output

Overview of the repository

System requirements

Operating system and hardware

Software

Docker container image

Dependent software

Instructions for running the workflow

Running the workflow on example data

Using the workflow on your data

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transcript-level RNA-seq data analysis using Salmon and edgeR v4

Example RNA-seq data availability

Example analysis report / expected output

Overview of the repository

System requirements

Operating system and hardware

Software

Docker container image

Dependent software

Instructions for running the workflow

Running the workflow on example data

Using the workflow on your data

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages