Skip to content

Request for update sourmash_plugin_branchwater lastest features and performance improvements in yacht #124

@tnmquann

Description

@tnmquann

Hi,

I've been using yacht as a way to reduce false positives in sourmash, and I wanted to ask if it's possible to update the tool to incorporate the latest features from sourmash_plugin_branchwater? This would be helpful for a couple of reasons:

  • Currently, the newest version of yacht only supports processing one sample at a time, which becomes time-consuming when working with many samples.
  • As highlighted in the tutorial, the training process is indeed time-consuming, especially with large databases. I've been training GTDB-R220 (all genomes) for nearly a week without results, whereas training on the genomic representatives version only took me about a morning. This performance gap is significant.

I believe incorporating improvements like supporting new rocksdb data format and using manysketch and/or fastmultigather could help reduce processing times and allow handling of multiple samples simultaneously.

Thanks for the great tool, and I'm looking forward to potential improvements in future releases!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions