-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
Description
Indexing is very slow. Currently only one file is indexed at any given time (limiting ARC to using only a single processor during indexing). Further tests need to be done to determine whether indexing multiple files at the same time will overwhelm disk I/O and/or result in overall improvements to indexing speed.
Ideas:
- Create an adaptive strategy where parallel indexing processes are launched until the I/O overhead becomes significant (see python psutil).
- Launch a fixed number of N indexing processes with N <= nprocs. Maybe make this configurable by the user.
- Develop a new strategy for indexing the fastq files and/or recruiting reads (address Add support for gzipped files #23, Improve speed and reduce disk IO for read recruitment #43, and other issues in the way the reads are recruited).
Reactions are currently unavailable