Amplicon sequencing analysis pipeline — from raw reads to interactive visualization
A Nextflow DSL2 pipeline for amplicon sequencing analysis. Takes demultiplexed paired-end FASTQ files and produces ASV tables, taxonomy, phylogeny, ordinations, and correlation networks.
nextflow run rec3141/microscape-nf \
--input /path/to/reads \
--ref_databases "silva:/path/to/silva_train_set.fasta:Domain,Phylum,Class,Order,Family,Genus" \
-profile conda \
-resumeThe pipeline pulls its tools from bioconda:
- papa2 — DADA2 denoising (Python,
conda install -c bioconda papa2) - microscape — Downstream analysis (Python,
conda install -c bioconda microscape) - cutadapt — Primer removal
- MAFFT — Multiple sequence alignment
For R users, the pipeline also supports --lang R which uses:
- dada2 — R/Bioconductor
- microscapeR — R companion package
| Stage | Process | Description |
|---|---|---|
| 1 | REMOVE_PRIMERS | Primer trimming with cutadapt |
| 2 | DADA2_FILTER_TRIM | Quality filtering (maxEE, truncQ, PhiX removal) |
| 3 | DADA2_LEARN_ERRORS | Per-plate error model learning |
| 4 | DADA2_DENOISE | Denoising, pair merging, per-plate chimera removal |
| 5 | MERGE_SEQTABS | Merge per-plate sequence tables |
| 6 | REMOVE_CHIMERAS | Consensus chimera removal on merged data |
| 7 | FILTER_SEQTAB | Length, prevalence, abundance, and depth filtering |
| 8 | ASSIGN_TAXONOMY | Naive Bayesian classification (parallel per ref DB) |
| 9 | BUILD_PHYLOGENY | MSA + neighbor-joining tree (optional) |
| 10 | RENORMALIZE | Taxonomic grouping and within-group proportions |
| 11 | ORDINATE | t-SNE ordination of samples and ASVs |
| 12 | NETWORK | SparCC correlation networks |
nextflow run rec3141/microscape-nf --help| Parameter | Default | Description |
|---|---|---|
--input |
required | Directory of paired-end *.fastq.gz files |
--ref_databases |
required | Reference DBs ("name:path:Levels;...") |
--outdir |
results |
Output directory |
--lang |
python |
Language: python (papa2) or R (dada2) |
--maxEE |
2 |
Max expected errors per read |
--truncQ |
11 |
Truncate at first base with quality <= Q |
--min_overlap |
10 |
Min overlap for pair merging |
--run_phylogeny |
false |
Build phylogenetic tree |
--threads |
8 |
CPU threads |
-profile conda # Auto-create conda environments
-profile docker # Use Docker container
-profile singularity # Use Singularity/Apptainer
-profile slurm # Submit to SLURM cluster
-profile test # Reduced resources for testing- papa2 — Python DADA2 port (bioconda)
- microscape — Python downstream analysis (bioconda)
- microscapeR — R downstream analysis (Bioconductor)
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13, 581-583.
BSD-3-Clause — see LICENSE.