Understanding SARS-CoV-2 transmission: Linking traditional contact tracing with genomic surveillance of a SARS-CoV-2 student cluster on a large urban campus
Bioinformatic methods for iScience paper "Understanding SARS-CoV-2 transmission: Linking traditional contact tracing with genomic surveillance of a SARS-CoV-2 student cluster on a large urban campus"
Sequencing libraries were generated using the ARTIC v4 primer sets and the Illumina COVIDSeq assay according to the manufacturer's protocol. Library pools were sequenced on a NextSeq 500 platform, generating 75x75 bp paired reads.
Software and pipeline are outlined below. For full details, please see pipeline.sh.
Processing was performed on the BU Supercomputing Cluster (SCC). See the wrapper script, pipeline.sh for full details. Consensus sequences were assigned lineages using the pangolin web portal.
| Package | Version |
|---|---|
| bowtie2 | 2.3.4.1 |
| samtools | 1.15.1 |
| lofreq | 2.1.3.1 |
| R | 4.0.2 |
| pangolin | 4.1.1 |
| pango data | 1.11 |
- FASTQ files were aligned to the Wuhan Hu-1 RefSeq (MN908947.3) using
bowtie2 - Primers were softclipped from mapped reads using
samtools ampliconclip - The alignment BAM file was sorted via
samtools sort - Viral genome coverage was calculated using the sorted BAM and
samtools depth - Indels were assessed with
lofreq indelqualand the resulting BAM was indexed withsamtools index - SNVs were quantified with
lofreq calland annotated with a custom R script
Clusters were assigned using a pairwise SNV difference matrix calculated from the coverage and VCF file. Any SNVs with <10X read depth were removed, as were SNVs in the 3' and 5' UTRs (bases <265 and >29675, respectively). Only consensus SNVs (>50%) were considered. Numeric cluster IDs were assigned using bins of 0, 1, and 2 SNV differences.
| Tool | Version |
|---|---|
| R | 4.0.2 |
| tidyverse | 1.3.0 |
| argparse | 2.1.5 |
