Indexing is really slow.

Indexing is very slow. Currently only one file is indexed at any given time (limiting ARC to using only a single processor during indexing). Further tests need to be done to determine whether indexing multiple files at the same time will overwhelm disk I/O and/or result in overall improvements to indexing speed.

Ideas:
1) Create an adaptive strategy where parallel indexing processes are launched until the I/O overhead becomes significant (see python psutil).
2) Launch a fixed number of N indexing processes with N <= nprocs. Maybe make this configurable by the user. 
3) Develop a new strategy for indexing the fastq files and/or recruiting reads (address #23, #43, and other issues in the way the reads are recruited).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing is really slow. #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Indexing is really slow. #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions