Skip to content

treangenlab/Seqwin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

install with bioconda Anaconda-Server Badge

Seqwin

Seqwin is a lightning‑fast, memory‑efficient toolkit for discovering signature sequences (genomic markers) that balance high sensitivity with high specificity. It builds a minimizer‑based pan‑genome graph across target and neighboring non‑target genomes and extracts signature sequences using a novel graph algorithm. Signatures can be used for downstream assay design such as qPCR, ddPCR, amplicon sequencing and hybrid capture probes.

Seqwin computes minimizers with ntHash, using code adopted from btllib (licensed under the GNU General Public License v3.0).


Table of contents

  1. Installation
  2. Quick start
  3. Citation

See the Seqwin Wiki for full documentation.

Installation

Seqwin is supported on Linux, macOS, and Windows via WSL for x86-64 and AArch64 systems.

Bioconda (recommended)

If Conda is not installed, install it with miniforge or miniconda.

1. Create a new Conda environment "seqwin" and install Seqwin via Bioconda

conda create -n seqwin seqwin \
  --channel conda-forge \
  --channel bioconda \
  --strict-channel-priority

Tip

Setting channel priority is important for Bioconda packages to function properly. You may also persist channel priority settings for all package installation by modifying your ~/.condarc file. For more information, check the Bioconda documentation.

2. Activate the environment and verify the install

conda activate seqwin
seqwin --help

Install from source with pip

Prerequisites

  • Python >=3.10 (with pip and development headers; usually included with official installers)
  • A C++17 compiler (GCC, Clang)
  • zlib development headers/library (zlib)

1. Clone this repository and install with pip
This will build the C++ extension/wrapper and install the required Python dependencies.

git clone https://github.com/treangenlab/Seqwin.git
cd Seqwin
pip install . -v
seqwin --help

2. Install non-Python dependencies
Seqwin can still run without these tools, but some features will be unavailable or skipped. See the Command Line Parameters for details.

Quick start

Identify signatures by providing one or more target taxa and non-target neighboring taxa.

seqwin \
  -t "Salmonella enterica subsp. diarizonae" \
  -n "Salmonella enterica subsp. salamae" \
  -n "Salmonella bongori" \
  --threads 8

Taxa names must be exact matches to NCBI Taxonomy.

Outputs are written to seqwin-out/ in your working directory (see Description of Outputs).

Alternatively, a list of target or non-target genomes can be provided as a text file of file paths. Each line should be the path to a genome FASTA file (plain text or gzipped).

seqwin --tar-paths targets.txt --neg-paths non-targets.txt

Below is an example of targets.txt or non-targets.txt

./genomes/GCA_003718275.1_ASM371827v1_genomic.fna
/data/genomes/GCA_000389055.1_46.E.09_genomic.fna
/data/genomes/GCA_008363955.1_ASM836395v1_genomic.fna.gz

Expected runtime (with --threads 8 or -p 8):

  • ~5min and 2.5GB peak RAM for ~500 bacterial genomes with default settings.
  • ~5min and 23GB peak RAM for ~15k bacterial genomes with --no-blast and --no-mash.

Run seqwin --help or seqwin -h to see the full command line interface.

Citation

If you use Seqwin in your research, please cite:

Michael X. Wang, Bryce Kille, Michael G. Nute, Siyi Zhou, Lauren B. Stadler, and Todd J. Treangen "Seqwin: Ultrafast identification of signature sequences in microbial genomes". Proceedings of ISMB 2026, accepted (2026).

Benchmarking datasets, outputs and scripts are available on Zenodo.