Skip to content
This repository was archived by the owner on Jan 31, 2020. It is now read-only.

Speed improvements, especially for sorry genomes#92

Open
joelmartin wants to merge 3 commits intogenome:masterfrom
joelmartin:speedups
Open

Speed improvements, especially for sorry genomes#92
joelmartin wants to merge 3 commits intogenome:masterfrom
joelmartin:speedups

Conversation

@joelmartin
Copy link

pindel2vcf runs very slowly on plant genomes that aren't in the very best of shape, the current version can take many days to process pindel output. The changes in this pull request let us process results in a reasonable amount of time.

Changes in this pull request are:
use fai fasta file index to avoid parsing entire reference file multiple times, it had been at least once + once per contig in results. The fai file is currently required by pindel so I believe it's reasonable to assume it exists.

Index first occurrence of each chromosome in each result file pindel _D,_INT etc... during first pass scan in GetSampleNamesAndChromosomeNames. Then use that to avoid reparsing entire pindel output files on every new contig.

limit calls to isSVSummarizingLine by checking if line starts with digit first.

use std::getline instead of read by char; I've tested std::getline with fasta sequence up to 400mb on a single line, it has no issues. I'm guessing the version note about getline having issues referred to std::istream::getline which needs buffer management.

timing;
kitaake - 12 chromosomes followed by 1300 scaffolds ( ~400mb )
v 0.6.3 56 minutes
v 0.6.0 5 minutes
v this 30 seconds

nipponbare - 12 chromosomes and 2 organelles ( ~400mb )
v 0.6.3 241 seconds
v 0.6.0 55 seconds
v this 50 seconds

panicum - 9 chromosomes followed by 8400 scaffolds ( ~550 mb )
result files pre-grepped for ChrID lines
v 0.6.3 killed after 3 days. Estimate over a month.
v 0.6.0 22 hours 46 minutes
v this 41 minutes

clostridium - 1 contig, 3.5mb
v 0.6.3 2 seconds
v 0.6.0 1 second
v this 1 second

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant