Hi,
I am currently trying to correct an 80G ONT UL dataset using herro. Both preprocess.sh and create_batched_alignments.sh have been successfully run on a large computing node with 1TB of memory. Since this 1TB node does not have a GPU, I need to switch to a GPU-equipped node. However, this GPU node has only 125GB of memory. Therefore, when I run the command: singularity run --nv herro-0.1.1/herro.sif inference --read-alns batches_dir/ -t 8 -d 0 -m model_R10_v0.1.pt -b 32 preprocessed.fastq.gz corrected_output.fasta , the memory quickly gets filled up and the job gets interrupted. Even after adjusting the parameters to -t 1 -b 1, the same problem persists.
I am considering whether I can split the preprocessed sequences (from preprocess.sh) into several smaller pieces and then run the inference command on each piece separately. However, the --read-alns batches_dir/ would still be the output generated from the entire 80G ONT UL dataset. Will this approach reduce the accuracy of the corrected sequences? What would be the appropriate way to handle this situation?
Hi,
I am currently trying to correct an 80G ONT UL dataset using herro. Both preprocess.sh and create_batched_alignments.sh have been successfully run on a large computing node with 1TB of memory. Since this 1TB node does not have a GPU, I need to switch to a GPU-equipped node. However, this GPU node has only 125GB of memory. Therefore, when I run the command:
singularity run --nv herro-0.1.1/herro.sif inference --read-alns batches_dir/ -t 8 -d 0 -m model_R10_v0.1.pt -b 32 preprocessed.fastq.gz corrected_output.fasta, the memory quickly gets filled up and the job gets interrupted. Even after adjusting the parameters to -t 1 -b 1, the same problem persists.I am considering whether I can split the preprocessed sequences (from preprocess.sh) into several smaller pieces and then run the inference command on each piece separately. However, the --read-alns batches_dir/ would still be the output generated from the entire 80G ONT UL dataset. Will this approach reduce the accuracy of the corrected sequences? What would be the appropriate way to handle this situation?