Repository containing the code and methods for Bosch & DeJesus et al. 2021
Below is a short legend describing the main files, followed by example running the code:
R code with functions useful for running the vulnerability analysis.
R code with functions useful for plotting vulnerability results
R code for running vulnerability analysis in paralell.
Stan code for the vulnerability model.
R code for creating plots of vulnerability results in parallel.
Python code to process fastq files with reads
Tools useful to interact with the subread aligner. Must install via your OS package manager (e.g. sudo apt install subread)
Tools useful for counting the aligned reads
To process reads you can use the process_reads.py script.
python process_reads.py --library example_lib.fasta example.fastq.gz
Please make sure the library file you use will match what you expect in the sequencing reads (e.g. constant sequences, reverse-complement of sgRNA sequence, etc.) If your library file contains only the original sgRNAs (as is the case in the included .fasta files), you can use the "--make_rev_lib", --upstream and --downstream arguments to help you create a version that should hopefully match what you are getting/expect from the sequencing core. The --upstream and --downstream allow you to specify the appropriate constant sequences upstream/downstream of the (reverse-complement) of the sgRNA. For example, if your reads have this structure GAGGTCGAGTACAAAAAC{reverse-complement of sgRNA}TCCCAGATTATATCTAT, and a .fasta file with just the sgRNA sequence the following command would automatically create the .fasta library that includes those constants and reverse-complements the sgRNA:
python process_reads.py --make_rev_lib --upstream GAGGTCGAGTACAAAAAC --downstream TCCCAGATTATATCTAT --library RLC12.fasta example.fastq.gz
If these are left empty, the upstream/downstream will default to the values used in this paper:
i.e.
python process_reads.py --make_rev_lib --library RLC12.fasta example.fastq.gz
would create and use RLC12.rev.fasta that matches the protocol and data for the reads submitted in this paper.
To run the actual analysis with example parameters simply run:
Rscript gene_vul_analysis_parellel.R
Alternatively, users can provide three parameters:
- desired_strain (label to describe strain being analyzed, helpful to distinguish results)
- label (general label to distinguish results)
- data_path (path to the CSV formatted file with passaging data)
Rscript gene_vul_analysis_parellel.R H37Rv test example_H37Rv_data.txt
Similarly, users can create plots of the results by running the following command:
Rscript gene_vul_plots_parallel.R
Alternatively users can provide four parameters:
- desired_strain (label to describe strain being analyzed, helpful to distinguish results)
- label (general label to distinguish results)
- data_path (path to the CSV formatted file with passaging data)
- results path (path to the directory containing the resuls from the analysis.)
Rscript gene_vul_plots_parallel.R H37Rv test example_H37Rv_data.txt data/H37Rv/test/