Skip to content

Indexing is really slow. #48

@samhunter

Description

@samhunter

Indexing is very slow. Currently only one file is indexed at any given time (limiting ARC to using only a single processor during indexing). Further tests need to be done to determine whether indexing multiple files at the same time will overwhelm disk I/O and/or result in overall improvements to indexing speed.

Ideas:

  1. Create an adaptive strategy where parallel indexing processes are launched until the I/O overhead becomes significant (see python psutil).
  2. Launch a fixed number of N indexing processes with N <= nprocs. Maybe make this configurable by the user.
  3. Develop a new strategy for indexing the fastq files and/or recruiting reads (address Add support for gzipped files #23, Improve speed and reduce disk IO for read recruitment #43, and other issues in the way the reads are recruited).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions