Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
27db76f
add mermaid-js DAG.
dfbautista May 15, 2025
016b6e0
Changed orientation and color.
dfbautista May 15, 2025
fdecdcb
fix clearly wrong DAG.
dfbautista May 16, 2025
714e399
Merge remote-tracking branch 'origin/master' into readme_changes
dfbautista May 21, 2025
9b27185
added tests folder with unit tests and dry-run config
dfbautista Jun 4, 2025
1be66ff
Merge remote-tracking branch 'origin/master' into pytest
dfbautista Jun 5, 2025
f4af774
Added test raw data.
dfbautista Jun 5, 2025
3cb67e7
Updated test_config with iSEE and renv options.
dfbautista Jun 5, 2025
5f1a9ed
Unit test run if results/ dir exists.
dfbautista Jun 5, 2025
d163f5c
Fixed dry-run test.
dfbautista Jun 5, 2025
97718cb
Make raw_data path variable in config to point to test data when needed.
dfbautista Jun 5, 2025
78718f7
Added conda environment files.
dfbautista Jul 10, 2025
91e6322
Updated rules with conda envs.
dfbautista Jul 10, 2025
3c31a62
Fixed env file names, config raw path.
dfbautista Jul 10, 2025
10300ab
Finishing adding conda support.
dfbautista Jul 21, 2025
33b9b31
Merge remote-tracking branch 'origin/master' into pytest
dfbautista May 12, 2026
5a0d212
Remove unit tests, added dry-run and core-workflow tests.
dfbautista May 14, 2026
1c96588
Add test fixtures and update Snakemake dry-run commands in integratio…
dfbautista May 14, 2026
a814320
Update CI workflow and test scripts for improved dependency managemen…
dfbautista May 14, 2026
b5d1e6f
Fix pandas sorted input for multiqc rule to ensure unique and sorted …
dfbautista May 14, 2026
43b73c2
Removing conda support for now, using the same config.yaml
dfbautista May 18, 2026
10a6227
Updated test config file and scripts to match main one.
dfbautista May 18, 2026
85ea545
Update configuration and schema files to standardize raw data path us…
dfbautista May 18, 2026
af3e8cb
Remove pytest support to run the whole pipeline for now.
dfbautista May 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: pytest

on:
push:
pull_request:

jobs:
pytest:
runs-on: ubuntu-latest
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: pip
cache-dependency-path: requirements-dev.txt

- name: Install test dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -r requirements-dev.txt

- name: Run pytest
run: pytest --tag ci
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ results/
logs/
benchmarks/
.snakemake/
.pytest_cache/
__pycache__/
tests/fixtures/reference/star_index/*
!tests/fixtures/reference/star_index/.gitkeep
tests/fixtures/reference/salmon_index/*
!tests/fixtures/reference/salmon_index/.gitkeep
config/samplesheet/*
!config/samplesheet/make_units_template*
!config/samplesheet/units.tsv
Expand All @@ -13,4 +19,4 @@ config/samplesheet/*
*._.DS_Store
iSEE/.*
tmp/
.Rproj.user
.Rproj.user
43 changes: 43 additions & 0 deletions bin/run_conda_snake.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/bin/bash
#SBATCH --export=NONE
#SBATCH -J rnaseq_workflow
#SBATCH -o rnaseq_workflow.o
#SBATCH -e rnaseq_workflow.e
#SBATCH --ntasks 1
#SBATCH --time 120:00:00
#SBATCH --mem=8G
#SBATCH --partition=bbc

cd $SLURM_SUBMIT_DIR

snakemake_module="bbc2/snakemake/snakemake-9.4.0"

module load $snakemake_module

# make logs dir if it does not exist already.
logs_dir="logs/"
[[ -d $logs_dir ]] || mkdir -p $logs_dir


echo "Start snakemake workflow." >&1
echo "Start snakemake workflow." >&2

snakemake \
-p \
--latency-wait 20 \
--sdm conda \
--jobs 100 \
--executor cluster-generic --cluster-generic-submit-cmd "mkdir -p logs/{rule}; sbatch \
-p ${SLURM_JOB_PARTITION} \
--export=ALL \
--nodes 1 \
--ntasks-per-node {threads} \
--mem={resources.mem_gb}G \
-t 120:00:00 \
-o logs/{rule}/{resources.log_prefix}.o \
-e logs/{rule}/{resources.log_prefix}.e" # SLURM hangs if output dir does not exist, so we create it before running sbatch on the snakemake jobs.
#--slurm \
#--default-resources slurm_account=${SLURM_JOB_USER} slurm_partition=${SLURM_JOB_PARTITION}

echo "snakemake workflow done." >&1
echo "snakemake workflow done." >&2
1 change: 1 addition & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ ref:
orgdb: org.Hs.eg.db
fdr_cutoff: 0.1
genes_of_interest: #DUSP1,KLF15,CRISPLD2 # create table in report of these genes, keep empty if no initial genes of interest.
raw_data_path: raw_data/ #tests/test_raw_data

# For GSEA quick_ref can only handle human, mouse, rat, and fly; all other organisms need to be filled in manually
# kegg_org should be a three or four letter string corresponding to your reference species. List of KEGG species is found here: https://www.genome.jp/kegg/tables/br08606.html
Expand Down
3 changes: 3 additions & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[pytest]
addopts = -ra --symlink --keep-workflow-wd-on-fail
testpaths = tests
6 changes: 6 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pytest>=8.0,<9.0
pytest-workflow>=2.0
PyYAML>=6.0
numpy>=1.26
pandas>=2.0
snakemake==9.13.2
4 changes: 2 additions & 2 deletions schema/units.schema.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
$id: "http://json-schema.org/draft-06/schema#"
$schema: "http://json-schema.org/draft-06/schema#"
$id: 'https://json-schema.org/draft/2020-12/schema'
$schema: 'https://json-schema.org/draft/2020-12/schema'
description: an entry in the sample sheet
properties:
sample:
Expand Down
19 changes: 19 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from pathlib import Path

import pytest


REPO_ROOT = Path(__file__).resolve().parents[1]
TEST_CONFIG = Path("tests/test_config/config.yaml")


@pytest.fixture(scope="session")
def repo_root():
return REPO_ROOT


@pytest.fixture(scope="session")
def test_config(repo_root):
yaml = pytest.importorskip("yaml")
with (repo_root / TEST_CONFIG).open() as handle:
return yaml.safe_load(handle)
2 changes: 2 additions & 0 deletions tests/fixtures/reference/annotation.gtf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
chr1 test gene 1 12 . + . gene_id "TEST1"; gene_name "TEST1"; gene_biotype "protein_coding";
chr1 test exon 1 12 . + . gene_id "TEST1"; gene_name "TEST1"; gene_biotype "protein_coding";
1 change: 1 addition & 0 deletions tests/fixtures/reference/fastq_screen.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Minimal placeholder config used only for Snakemake dry-run tests.
2 changes: 2 additions & 0 deletions tests/fixtures/reference/genome.dict
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
@HD VN:1.6 SO:unsorted
@SQ SN:chr1 LN:12
2 changes: 2 additions & 0 deletions tests/fixtures/reference/genome.fa
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>chr1
ACGTACGTACGT
1 change: 1 addition & 0 deletions tests/fixtures/reference/genome.fa.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chr1 12 6 12 13
2 changes: 2 additions & 0 deletions tests/fixtures/reference/grouped_contigs.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
name contigs
chr1 chr1
2 changes: 2 additions & 0 deletions tests/fixtures/reference/known_indels.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO
2 changes: 2 additions & 0 deletions tests/fixtures/reference/known_snps.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO
1 change: 1 addition & 0 deletions tests/fixtures/reference/salmon_index/.gitkeep
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
placeholder for Snakemake dry-run tests
1 change: 1 addition & 0 deletions tests/fixtures/reference/sortmerna_idx/.gitkeep
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
placeholder for Snakemake dry-run tests
2 changes: 2 additions & 0 deletions tests/fixtures/reference/sortmerna_rfam_5_8s.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>rfam_5_8s
ACGT
2 changes: 2 additions & 0 deletions tests/fixtures/reference/sortmerna_rfam_5s.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>rfam_5s
ACGT
2 changes: 2 additions & 0 deletions tests/fixtures/reference/sortmerna_silva_arc_16s.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>silva_arc_16s
ACGT
2 changes: 2 additions & 0 deletions tests/fixtures/reference/sortmerna_silva_arc_23s.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>silva_arc_23s
ACGT
2 changes: 2 additions & 0 deletions tests/fixtures/reference/sortmerna_silva_bac_16s.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>silva_bac_16s
ACGT
2 changes: 2 additions & 0 deletions tests/fixtures/reference/sortmerna_silva_bac_23s.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>silva_bac_23s
ACGT
2 changes: 2 additions & 0 deletions tests/fixtures/reference/sortmerna_silva_euk_18s.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>silva_euk_18s
ACGT
2 changes: 2 additions & 0 deletions tests/fixtures/reference/sortmerna_silva_euk_28s.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>silva_euk_28s
ACGT
1 change: 1 addition & 0 deletions tests/fixtures/reference/star_index/.gitkeep
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
placeholder for Snakemake dry-run tests
1 change: 1 addition & 0 deletions tests/fixtures/renv_root/.gitkeep
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
placeholder for Snakemake dry-run tests
1 change: 1 addition & 0 deletions tests/test_config/R_proj_packages.txt
102 changes: 102 additions & 0 deletions tests/test_config/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
quick_ref:
# Only fill this if you are NOT doing SNP calling. "ref_genome_verison" is the dir of the date and version of the reference. Check what is available at /varidata/research/projects/bbc/versioned_references. If you are not sure, you can use the latest one. "species_name" is the dir name of the reference genome, check /varidata/research/projects/bbc/versioned_references/latest/data/ to see species are there. The most commonly used species are mmm10_gencode and hg38_gencode. ref_genome_version is optional whereas species_name is MANDATORY; if you leave quick_ref section blank, the workflow will use references from "ref" section below.
ref_genome_version: # The earliest recommended version is 2021-08-10_11.12.27_v6. Note that the Salmon index might not exist for earlier versions.
species_name:
ref:
index: tests/fixtures/reference/star_index
salmon_index: tests/fixtures/reference/salmon_index
annotation: tests/fixtures/reference/annotation.gtf
dict: tests/fixtures/reference/genome.dict
# Below used only for variant calling
snpeff_db_id: test_ref
known_snps: tests/fixtures/reference/known_snps.vcf
known_indels: tests/fixtures/reference/known_indels.vcf
sequence: tests/fixtures/reference/genome.fa
fai: tests/fixtures/reference/genome.fa.fai

# OrgDB R package for covnerting gene names. Common choices are 'org.Mm.eg.db' for mouse and 'org.Hs.eg.db' for human.
orgdb: org.Hs.eg.db
fdr_cutoff: 0.1
genes_of_interest: #DUSP1,KLF15,CRISPLD2 # create table in report of these genes, keep empty if no initial genes of interest.
raw_data_path: tests/test_raw_data/

# For GSEA quick_ref can only handle human, mouse, rat, and fly; all other organisms need to be filled in manually
# kegg_org should be a three or four letter string corresponding to your reference species. List of KEGG species is found here: https://www.genome.jp/kegg/tables/br08606.html
kegg_org: hsa
# reactome_org can be "human", "mouse", "rat", "celegans", "yeast", "zebrafish", "fly"
reactome_org: human
# Full species name. Applicable input strings can be found by installing the msigdbr library in R and using msigdbr::msigdbr_species()
msigdb_organism: Homo sapiens
# Choose which gene sets you would like to test against
pathway_str: Reactome,BP,BP-simplified,KEGG,H,C1,C2,C3,C4,C5,C6,C7,C8

numeric_variables:

# are the sequencing reads paired-end ('PE') or single-end ('SE')
PE_or_SE: PE

call_variants: False
grouped_contigs: tests/fixtures/reference/grouped_contigs.tsv

run_vis_bigwig : False
run_rseqc: False

# R project config
Rproj_dirname: "VBCS-000_Rproj"
Rproj_init_git: False
## use renv cache or install/copy all packages in project.
renv_use_cache: True
## copy packages from user library if available?
renv_use_user_lib: True
renv_symlink_from_cache: True #False
# Use Pak to install packages
renv_use_pak: False # I couldn't get pak to install to the renv cache which resulted in rebuilding the library each time; see https://github.com/r-lib/pak/issues/284
## if using renv cache, this is the path to where the cache is/will be stored.
renv_root_path: tests/fixtures/renv_root

# iSEE config
iSEE_app_name: "RNAseq_proj"
deploy_to_shinyio: False
shinyio_account_name: "vai-bbc" # valid account names can be found using rsconnect::accounts(); If blank, follow instructions at https://docs.posit.co/shinyapps.io/guide/getting_started/#configure-rsconnect


####################################################################
# FOR MOST STANDARD USE CASES, THE BELOW DO NOT NEED TO BE CHANGED.#
####################################################################

# path to sample sheet relative to the base project directory (containing config/, workflow/ etc)
units: tests/test_config/samplesheet/units.tsv
comparisons: tests/test_config/samplesheet/comparisons.tsv

sortmerna:
rfam5_8s: tests/fixtures/reference/sortmerna_rfam_5_8s.fasta
rfam5s: tests/fixtures/reference/sortmerna_rfam_5s.fasta
silva_arc_16s: tests/fixtures/reference/sortmerna_silva_arc_16s.fasta
silva_arc_23s: tests/fixtures/reference/sortmerna_silva_arc_23s.fasta
silva_bac_16s: tests/fixtures/reference/sortmerna_silva_bac_16s.fasta
silva_bac_23s: tests/fixtures/reference/sortmerna_silva_bac_23s.fasta
silva_euk_18s: tests/fixtures/reference/sortmerna_silva_euk_18s.fasta
silva_euk_28s: tests/fixtures/reference/sortmerna_silva_euk_28s.fasta
idx_dir: tests/fixtures/reference/sortmerna_idx/

modules:
deeptools: bbc2/deeptools/deeptools-3.5.2
fastqc: bbc2/fastqc/fastqc-0.12.1
fastq_screen: bbc2/fastq_screen/fastq_screen-0.14.0
gatk: bbc2/gatk/gatk-4.3.0.0
htslib: bbc2/htslib/htslib-1.17
multiqc: bbc2/multiqc/multiqc-1.14
pandoc: bbc2/pandoc/pandoc-3.1.2
picard: bbc2/picard/picard-3.0.0
# The easiest way to get renv to work is to make sure all packages are already installed and up to date in your user library which will then be simply copied to the project library
R: bbc2/R/alt/R-4.5.0-setR_LIBS_USER
rseqc: bbc2/rseqc/rseqc-5.0.4
salmon: bbc2/salmon/salmon-1.10.0
samtools: bbc2/samtools/samtools-1.17
seqtk: bbc2/seqtk/seqtk-1.3-r115-dirty
snpeff: bbc2/SnpEff/SnpEff-5.1
sortmerna: bbc2/sortmerna/sortmerna-4.3.6
star: bbc2/STAR/STAR-2.7.10a
trim_galore: bbc2/trim_galore/trim_galore-0.6.10
ucsctools: bbc2/ucsc_tools/ucsc_tools-20231127
vt: bbc2/vt/vt-0.1.16
1 change: 1 addition & 0 deletions tests/test_config/grouped_contigs.tsv
2 changes: 2 additions & 0 deletions tests/test_config/samplesheet/comparisons.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
comparison_name group_test group_reference group_reg_formula
trt_vs_untrt trt untrt ~group
5 changes: 5 additions & 0 deletions tests/test_config/samplesheet/units.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
sample group fq1 fq2 RG
SRR1039508 untrt SRR1039508_L000_R1_001.fastq.gz SRR1039508_L000_R2_001.fastq.gz
SRR1039509 trt SRR1039509_L000_R1_001.fastq.gz SRR1039509_L000_R2_001.fastq.gz
SRR1039512 untrt SRR1039512_L000_R1_001.fastq.gz SRR1039512_L000_R2_001.fastq.gz
SRR1039513 trt SRR1039513_L000_R1_001.fastq.gz SRR1039513_L000_R2_001.fastq.gz
80 changes: 80 additions & 0 deletions tests/test_integration_run.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
- name: test-dry-run
tags:
- ci
- snakemake
- dry-run
command: >
env XDG_CACHE_HOME=.cache snakemake --dry-run --printshellcmds --cores 1 --forceall
--snakefile workflow/Snakefile --configfile tests/test_config/config.yaml
exit_code: 0
stderr:
must_not_contain:
- 'MissingInputException'
- 'WorkflowError'
stdout:
contains:
- "rename_fastqs"
- "trim_galore_PE"
- "STAR"
- "salmon"
- "multiqc"
- "SummarizedExperiment"
- "deseq2"
- "gsea"
- "make_final_report"
- "isee"

- name: test-dry-run-optional-targets
tags:
- ci
- snakemake
- dry-run
command: >
env XDG_CACHE_HOME=.cache snakemake
results/avg_bigwigs/untrt.unstr.bw
results/rseqc_genebody_cov/SRR1039508/SRR1039508.geneBodyCoverage.txt
results/iSEE/deployed
--dry-run --printshellcmds --cores 1 --forceall --snakefile workflow/Snakefile
--configfile tests/test_config/config.yaml
--config run_vis_bigwig=True run_rseqc=True deploy_to_shinyio=True
exit_code: 0
stderr:
must_not_contain:
- 'MissingInputException'
- 'WorkflowError'
stdout:
contains:
- "bigwigs"
- "avg_bigwigs"
- "rseqc_genebody_cov"
- "deploy_isee_to_shinyappio"

- name: test-dry-run-variant-targets
tags:
- ci
- snakemake
- dry-run
command: >
env XDG_CACHE_HOME=.cache snakemake
results/variant_calling/final/07a_variant_annot/all.merged.filt.PASS.snpeff.vcf.gz
results/variant_calling/final/07b_snp_pca_and_dendro/snprelate.html
--dry-run --printshellcmds --cores 1 --forceall --snakefile workflow/Snakefile
--configfile tests/test_config/config.yaml
--config call_variants=True
exit_code: 0
stderr:
must_not_contain:
- 'MissingInputException'
- 'WorkflowError'
stdout:
contains:
- "markdups"
- "splitncigar"
- "haplotypecaller"
- "combinevar"
- "jointgeno"
- "sortVCF"
- "filter_vcf"
- "BQSR"
- "variant_annot"
- "snprelate"
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Loading