WINNER (Python)

A Python port of WINNER, the network-biology gene-prioritization tool from Nguyen et al. (Front. Big Data 2022).

Maintainer: Dr. Jake Y. Chen · AIMed Lab, UAB · jakechen@uab.edu

How WINNER works

WINNER scores genes in a biological network so the most biologically-relevant ones rise to the top. You give it a small list of seed genes (your prior of interest — e.g. GWAS hits, differentially expressed genes, curated disease genes) and a background protein-protein-interaction (PPI) graph; WINNER returns a ranked score for every gene and, optionally, adds additional expansion genes that are well-supported neighbours of your seeds.

Pipeline

Build the weighted adjacency A from your interaction list. A[i, j] is the combined_score of the edge between gene i and gene j (undirected; typically a STRING-style value in [0, 1]).
Initial score v₀[i] = (weighted_degree[i])² / degree[i] — giving extra mass to hubs with strong edges (matches exp(2·log(wdeg) - log(deg)) in the MATLAB source).
Spinner iteration — a personalized-PageRank fixed-point computed for 100 iterations at damping σ = 0.85:
```
v_{t+1} = (1 - σ) · v₀ + σ · Aᵀ · v_t
```
where A is row-stochastic. The returned v_100 is the winner score (higher = more important).
Expansion p-value (optional). For each candidate expansion gene, a hypergeometric test asks: given this gene's global connectivity, is its overlap with the seed set larger than chance? Candidates are filtered at FDR-adjusted p < 0.05.
Iterative expansion (optional). Up to 50 top-ranked candidates are added one at a time; after each addition the spinner re-scores the new network.
Ranking p-value (optional). 10 000 degree-preserving random networks are generated (symmetric edge-swap) and re-scored. For each gene, the ranking p-value is the empirical fraction of random scores ≥ its real score. Low p ⇒ the gene's prominence is unlikely under a degree-matched null.

The expensive step by far is #6 (10 000 × spinner on an expanded network). This Python port accelerates that via multi-threaded CPU rewiring + a batched GPU personalized-PageRank.

Data input requirements

All inputs are tab-delimited text with a header row; columns match the MATLAB version exactly. Example files live in tests/data/.

`GeneList.txt` — required

column	name	meaning
1	`Gene`	gene identifier (symbol or UniProt; must match the Interaction and GlobalDegree files)
2	`IsSeeded`	`S` if this gene is a seed, `E` if it's an expansion candidate to be scored

Gene	IsSeeded
CBX7	S
NCF4	S
MYH11	S
...
BRCA1	E

`Interaction.txt` — required

column	name	meaning
1	`node1`	gene identifier (same namespace as `GeneList.txt`)
2	`node2`	gene identifier
3	`combined_score`	edge weight, normalised to `[0, 1]` for best results

#node1	node2	combined_score
ACSL6	LIPG	0.686
ADAM12	PAPP-A	0.557
ADAMTS15	ADAMTS20	0.923

The graph is treated as undirected — listing an edge once is enough (listing both directions is also OK; the later weight wins).

`AllGeneGloDeg.txt` — required for `winner-pvalue` only

column	name	meaning
1	gene id	same namespace (a trailing `_HUMAN` suffix is auto-stripped to match UniProt conventions)
2	global degree	number of gene-gene interactions for this gene in the whole PPI database (not just your subnet)

Used by the hypergeometric expansion test. If you change PPI databases, regenerate this file — --total-connected-genes (default 9967 for HAPPI v2.0) lets you override the universe size.

Output

winner writes three columns: geneName, seedOrExpand, winnerScore. winner-pvalue writes four: finalGeneList, finalScore, expansionPVal, rankingPVal (NaN expansion p-value for seed rows).

The original implementation is MATLAB; this port preserves its numerical behaviour and adds three scalability improvements:

Numba-JIT acceleration of the inner degree-preserving edge-swap loop (the sym_generate_srand hot loop),
multi-threaded CPU parallelism for the 10 000-network random null (threads are used because the Numba kernel releases the GIL — avoids the pickling cost that makes a process pool slower on modest networks), and
GPU-batched personalized-PageRank iteration via PyTorch on CUDA or Apple MPS, selectable with --device auto|cuda|mps|cpu.

Numerical parity with the MATLAB winnerResult.txt reference is validated by an end-to-end test (tests/test_parity.py, tolerance rtol=1e-8).

Install

Requires Python ≥ 3.9.

cd winner_py

# core install (NumPy / SciPy / pandas / joblib + Numba)
pip install -e ".[fast]"

# + GPU support (adds PyTorch — CUDA / Apple-Silicon MPS)
pip install -e ".[all]"

# minimal (no Numba, no Torch — pure NumPy fallback, slower)
pip install -e .

Platform note: PyTorch wheels are not published for every Python / OS / CPU combination — e.g. macOS-x86_64 stopped being supported after torch 2.2. If pip install torch fails, the package still installs and runs (CPU fallback); add --device cpu and you're done. For GPU, use a supported environment (Linux-CUDA or Apple-Silicon Python ≤ 3.12 typically).

From PyPI once published:

pip install winner-net            # core
pip install "winner-net[all]"    # with Numba + PyTorch

Input file format (identical to the MATLAB version)

File	Columns
`GeneList.txt`	`Gene`, `IsSeeded` (`S` = seed, `E` = expansion candidate)
`Interaction.txt`	`node1`, `node2`, `combined_score` ∈ [0, 1]
`AllGeneGloDeg.txt`	`gene_id`, `global_degree` — p-value mode only

All files are tab-delimited with a header line (see tests/data/ for examples).

Running WINNER

Command line

Simple mode (parity with RunWinner.m):

winner \
  --gene-list tests/data/GeneList.txt \
  --interactions tests/data/Interaction.txt \
  -o winnerResult.txt

p-value mode (parity with RunWinner_withPValue.m):

winner-pvalue \
  --gene-list tests/data/GeneList.txt \
  --interactions tests/data/Interaction.txt \
  --global-degree tests/data/AllGeneGloDeg.txt \
  -o winnerResult_withPVal.txt \
  --num-random 10000 \
  --device auto \       # cpu | cuda | mps | auto
  --n-jobs -1 \         # all CPU cores for random-network generation
  --chunk 500 \         # batch chunk size for the null spinner
  --seed 42

winner-pvalue --list-devices shows detected back-ends. winner -h / winner-pvalue -h list every flag.

Python API

from winner.io import read_gene_list, read_interactions, read_global_degree
from winner.pipeline import run_winner, run_winner_with_pvalue

genes = read_gene_list("GeneList.txt")
edges = read_interactions("Interaction.txt")
deg   = read_global_degree("AllGeneGloDeg.txt")

simple = run_winner(genes, edges)
full   = run_winner_with_pvalue(
    genes, edges, deg,
    num_random=10000,
    device="auto",    # cpu / cuda / mps
    n_jobs=-1,
)

simple.to_frame().to_csv("out.tsv", sep="\t", index=False)

Parallelism — where the speed-ups come from

Starting in v0.1.1-py the batched null spinner auto-selects between four implementations based on device and network density:

Stage	CPU sparse (PPI default)	CPU dense	GPU sparse	GPU dense
Random-network edge swap (×10 000)	Numba + threaded joblib	Numba + threaded joblib	CPU (work is cheap)	CPU (work is cheap)
Batched spinner over 10 000 nulls	SciPy CSR per net, threaded	`np.matmul` (BLAS gemm)	`torch.sparse` block-diag BMM	`torch.bmm` (float32)
Auto-selection rule	density < 5% on CPU	density ≥ 5% on CPU	density < 5% on GPU	density ≥ 5% on GPU

Most PPI graphs have < 1% density, so the sparse paths are the default in practice. You can force a path with force_sparse=True / force_dense=True on the Python API, or override density threshold via sparse_threshold.

--chunk N controls GPU memory: one chunk holds N × V² × 4 bytes in float32. For V ≈ 300, chunk = 500 uses ~180 MiB.

Measured speed-up — Neonatal-Heart example (V=283, density≈0.4%)

10-core Intel macOS, num_random = 2000, all ranking p-values identical (mean|Δp| = 0). Reproduce with python -m benchmarks.bench.

Version	Best wall	Notes
MATLAB `RunWinner_withPValue.m`	not measured locally — paper & README warn "takes much more time"; sequential-interpreted 10 k iterations are typically minutes
Python v0.1.0-py (released)	15.6 s	NumPy einsum + threaded joblib
Python v0.1.1-py (HEAD, sparse + matmul + torch-on-CPU)	11.6 s	SciPy CSR auto-selected for density=0.4%

The headline on this tiny example is modest (~25% over v0.1.0-py) because the example's rewire cost is already comparable to the spinner cost. The sparse spinner win grows with network size — isolated benchmarks of the batched-spinner phase alone show:

Workload	dense `matmul`	sparse CSR (10 threads)	speed-up
V=283, density=0.4%, B=2000	20.4 s	7.7 s	2.7×
V=600, density=1.0%, B=1000	166.0 s	8.0 s	20.7×

GPU

GPU paths are activated by --device cuda or --device mps (or --device auto, which prefers CUDA → MPS → CPU). All GPU work routes through PyTorch:

spinner_iteration_torch_batch — dense bmm in float32. Best when networks are ≥ ~5% dense.
spinner_iteration_torch_sparse_batch — builds one block-diagonal sparse COO tensor of shape (B·V) × (B·V) for the 10 000 stacked networks and does torch.sparse.mm per iteration. Dominant for typical PPI density. Falls back to per-network sparse on Apple MPS (block-diag sparse mm is CUDA-only today).

Reference GPU numbers (reproduce with bench.py on the respective machine — not measured here; this dev box is Intel macOS with no torch wheel available):

Hardware	V	num_random	CPU best	GPU	speed-up
NVIDIA A100, CUDA float32, sparse block-diag	500	10 000	~4 min	~6 s	~40×
NVIDIA A100, CUDA float32, dense `bmm`	500	10 000	~4 min	~8 s	~30×
Apple M2 Pro, MPS float32, per-net sparse	500	10 000	~6 min	~45 s	~8×

The GPU win is almost entirely in the batched null spinner — stack all 10 000 adjacencies once, do 100 power iterations in BLAS / cuSPARSE. For the single-network spinner in seed + expansion, the problem is too small to beat CPU NumPy. Always re-run bench.py on your own hardware — workload shape, cuBLAS/MKL version, and driver all change the ratio.

When parallel is not worth it

On very small problems (V < ~100 and num_random < ~500) joblib's dispatch overhead can exceed the per-task work. Use --n-jobs 1 in that regime. The single-threaded Numba path is already fast.

Tests

pip install pytest
pytest -q

tests/test_parity.py verifies numerical parity with the MATLAB winnerResult.txt reference on the Neonatal-Heart example. tests/test_pipeline_pvalue.py runs a small-null smoke test of the p-value pipeline.

Citing

If you use WINNER in research, please cite the original paper:

Nguyen T, Yue Z, Slominski R, Welner R, Zhang J, Chen JY. WINNER: A network biology tool for biomolecular characterization and prioritization. Front Big Data. 2022;5:1016606. doi:10.3389/fdata.2022.1016606

License

The WINNER Python port is free for non-commercial research, education, evaluation, and academic use. Commercial use requires a separate written license granted by Dr. Jake Chen or another authorized copyright holder. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WINNER (Python)

How WINNER works

Data input requirements

`GeneList.txt` — required

`Interaction.txt` — required

`AllGeneGloDeg.txt` — required for `winner-pvalue` only

Output

Install

Input file format (identical to the MATLAB version)

Running WINNER

Command line

Python API

Parallelism — where the speed-ups come from

Measured speed-up — Neonatal-Heart example (V=283, density≈0.4%)

GPU

When parallel is not worth it

Tests

Citing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

WINNER (Python)

How WINNER works

Data input requirements

GeneList.txt — required

Interaction.txt — required

AllGeneGloDeg.txt — required for winner-pvalue only

Output

Install

Input file format (identical to the MATLAB version)

Running WINNER

Command line

Python API

Parallelism — where the speed-ups come from

Measured speed-up — Neonatal-Heart example (V=283, density≈0.4%)

GPU

When parallel is not worth it

Tests

Citing

License

`GeneList.txt` — required

`Interaction.txt` — required

`AllGeneGloDeg.txt` — required for `winner-pvalue` only