-
Notifications
You must be signed in to change notification settings - Fork 1
Commandline Arguments
| Command | Description |
|---|---|
| Required | |
| -o | <pipeline output directory> name of the directory where you would like your hamrlnc run to be |
| -c | <filenames for each fastq.csv> a csv file that corresponds each srr code (or name of fastq file) to your desired nomenclature for each read |
| -g | <reference genome.fa> a fasta file of the genome of the model organism |
| -i | <reference genome annotation.gff3> a gff3 file of the genome of the model organism, note we require gff3 instead of gtf |
| Optional | |
-l |
<minimum average read length> average: auto-detect |
| -n | [number of threads] default=4 |
| -r | [perform fastqc] default=false |
| -d | [raw fastq folder] default=NA |
| -t | [trim raw fastq] default=false |
| -D | [raw bam folder] default=NA |
| -b | [sort raw bam] default=false |
-I |
[STAR genome index folder] default=NA |
| -k | [activate modification annotation workflow] default=false |
| -p | [activate lncRNA annotation workflow] default=false |
| -u | [activate featurecount workflow] default=false |
| -f | [HAMR filter] default=filter_SAM_number_hits.pl |
| -m | [HAMR model] default=euk_trna_mods.Rdata |
| -Q | [HAMR minimum quality score: 0-40] default=30 |
| -C | [HAMR minimum coverage: 0-∞] default=10 |
| -E | [HAMR sequencing error: 0-1] default=0.01 |
| -P | [HAMR maximum p-value: 0-1] default=1 |
| -F | [HAMR maximum FDR: 0-1] default=0.05 |
| -O | [Panther organism taxon ID] default="3702" |
| -A | [Panther annotation dataset] default="GO:0008150" |
| -Y | [Panther test type: FISHER or BINOMIAL] default="FISHER" |
| -R | [Panther correction type: FDR, BONFERRONI, or NONE] default="FDR" |
| -y | [keep intermediate bam files] default=false |
| -q | [halt program upon completion of checkpoint 2] default=false |
| -G | [attribute used for featurecount] default="gene_id" |
| -x | [max intron length for lncRNA-annotation-unique STAR mapping] default=NA |
| -H | [SERVER alt path for panther] |
| -U | [SERVER alt path for HAMRLNC] |
| -W | [SERVER alt path for GATK] |
| -S | [SERVER alt path for HAMR] |
| -J | [SERVER alt path for CPC2] |
| -M | [SERVER alt path for Rfam] |
| -h | [help message] |
Read depth of input dataset is an important factor to consider if a user is interested in using HAMRLINC to annotate RNA modification. We recommend a read depth of at least 20M for paired-end reads and 40M for single reads, for diploid genomes. When having lower read depth and replicates, merge the datasets to get more depth of coverage.
HAMRLNC uses seqkit to automatically detect the read lengths of your input FASTQ files. If your inputs have differing average read length, or you are combining different batches of sequencing experiments in one analysis, the minimum of them will be used. This ensures a stringent mapping step. You can also specify this using the -l option.
We recommend that users download the reference genome and annotation files for their sample organism from ENSEMBL. Annotation files from other sources might not contain certain transcript features like ncRNA. HAMRLINC relies on the information from the user supplied annotation file for the subtype classification of predicted modified transcripts.
The default parameters are tested and provide a reasonably stringent criteria on top of the layers of statistical analysis HAMR performs for each modification call. Before making any changes, we suggest gaining a good understanding of each parameter's function, as outlined in this protocol.
We use Panther's_API to generate the GO terms of modified transcripts. Before activating the flag for this analysis, first check panther's webiste specification for a list of panther's supported organisms and annotation dataset website. Record down the respective ID's and use the corresponding flag of HAMRLNC for the proper panther parameters. For example, to run a GO analysis on the molecular functions in a mouse sample, add flags -O 10090 and -A GO:0003674.