Finding Informative Regulatory Elements — a Python implementation with single-cell support.
pyFIRE is a ground-up rewrite of FIRE (Elemento, Slonim & Tavazoie, Molecular Cell 2007), an information-theoretic framework for de novo discovery of DNA/RNA regulatory motifs. This version adds native single-cell capabilities inspired by VISION (DeTomaso & Yosef, Nature Communications 2019), integrates with the scverse ecosystem (scanpy / AnnData), and is pip-installable with numba-accelerated internals.
FIRE discovers motifs whose presence in regulatory sequences (promoters, UTRs, enhancers) is statistically associated with gene expression patterns. It uses mutual information as a non-parametric, assumption-free measure of dependence — capturing linear, nonlinear, and combinatorial relationships that correlation-based methods miss.
Sequences + Expression
│
▼
┌──────────────┐
│ K-mer │
│ Screening │
└──────┬───────┘
▼
┌──────────────┐
│ IUPAC │
│ Optimization │
└──────┬───────┘
▼
┌──────────────┐
│ Significance │
│ Testing │
└──────┬───────┘
▼
┌──────────────┐
│ Distribution │
│ & Combine │
└──────┬───────┘
▼
FIREResult
K-mer Screening: Screen all 4^k k-mers by CMI(kmer; expression | length), then keep candidate seeds.
IUPAC Optimization: Greedily expand seeds with degenerate IUPAC codes to maximize CMI; filter redundant motifs.
Significance Testing: Permutation testing (z-scores, p-values) + jackknife robustness.
Distribution & Combine: Test positional/orientation bias and detect synergistic or redundant motif pairs.
pyFIRE supports two single-cell workflows:
De novo discovery — discover motifs directly from scRNA-seq data using per-cell conditional mutual information. For each cell c and motif m, the score is CMI(motif_presence; expression_cell | length) across all genes. This produces a (n_cells, n_motifs) MI matrix.
AnnData + Sequences
│
▼
┌──────────────┐
│ Metacell │
│ Pre-screen │
└──────┬───────┘
▼
┌──────────────┐
│ Per-cell │
│ CMI Matrix │
└──────┬───────┘
▼
┌──────────────┐
│ Spatial │
│ Coherence │
└──────┬───────┘
▼
┌──────────────┐
│ IUPAC │
│ Optimization │
└──────┬───────┘
▼
┌──────────────┐
│ Significance │
│ Testing │
└──────┬───────┘
▼
SingleCellDiscoveryResult
Metacell Pre-screen: Per-cluster pseudo-bulk CMI seed screen with permutation p-values and BH-FDR thresholding.
Per-cell CMI Matrix: Compute CMI(kmer; expression_cell | confounders) for each cell × candidate.
Spatial Coherence: Geary's C on the KNN graph with permutation p-values and BH-FDR thresholding.
IUPAC Optimization: Greedy motif optimization with strict max-frequency cap and CMI/MI redundancy filtering.
Significance Testing: Final Geary's C testing with full permutations and BH-FDR correction.
Post-hoc scoring — given motifs from a bulk run (or supplied externally), score per-cell motif activity as the weighted mean expression of each motif's target genes, then test for spatial coherence on a KNN graph using Geary's C autocorrelation. Motifs whose activity is smoothly distributed across transcriptionally similar cells are reported as significant.
Results are stored directly in AnnData objects.
Sequences of different lengths have different probabilities of containing any given motif by chance. pyFIRE corrects for this by computing conditional mutual information — CMI(motif; expression | length) — throughout the pipeline, rather than raw MI. This conditions out the confounding effect of length while keeping motif counts as simple binary presence/absence (no density normalization needed).
In both bulk and single-cell modes, requested length correction is automatically disabled when sequence lengths are effectively constant (e.g., fixed-width promoter windows). In that case, pyFIRE falls back to a single conditioning state instead of over-correcting.
pip install -e "."With single-cell support (anndata + scanpy):
pip install -e ".[singlecell]"Development (includes pytest, ruff):
pip install -e ".[dev]"- Python >= 3.10
- numpy, scipy, pandas, numba, pyfaidx, click, tqdm, matplotlib
- Optional: anndata, scanpy (for single-cell mode)
# Discover motifs from regulatory sequences + expression data
# Default outputs go to <expression>_FIRE_<mode>/
# (e.g., expression.tsv + mode=dna -> expression_FIRE_dna/pyfire_results_dna.tsv + plots/matrix files)
pyfire discover --fasta promoters.fasta -e expression.tsv
# RNA mode (single-strand scanning + RC-informativeness seed filter)
pyfire discover --fasta utr3.fasta -e expression.tsv --mode rna
# With custom parameters
pyfire discover --fasta promoters.fasta -e expression.tsv \
-k 7 \
--n-seeds 200 \
--seed-permutations 200 \
--seed-pvalue 1e-5 \
# optional alternative rule:
# --seed-z-threshold 3.0 \
--seed-max-freq 0.33 \
# optional override for optimization-stage cap:
# --max-freq 0.33 \
--seed-stop-failures 5 \
--n-permutations 10000 \
--z-threshold 4.0 \
--robustness-threshold 3 \
--output-dir runs/run1
# Select expression columns directly from a multi-column table
# (supports names or 1-based indices)
pyfire discover --fasta promoters.fasta -e data_examples/test_DESeq_logFC.txt.gz \
--cols GENE,log2FoldChange
# equivalent: --cols 8,2
# Disable automatic plotting
pyfire discover --fasta promoters.fasta -e expression.tsv --no-plot
# Scan sequences for known motifs
pyfire scan sequences.fasta motifs.txt -o scan_results.tsv
# Single-cell de novo discovery (with plots)
pyfire discover-sc pbmc.h5ad promoters.fasta \
--groupby celltype \
-k 7 \
--mode dna \
--seed-fdr 0.1 \
--seed-max-freq 0.33 \
--max-freq 0.5 \
--n-permutations 5000 \
-o pbmc_results.h5ad
# Re-plot from saved results without re-running discovery
pyfire draw results.tsv --format pdf --no-logos --lp-max 10
# Re-draw single-cell outputs without re-running discovery
pyfire draw-sc pbmc_scFIRE --groupby celltype --format allUse the CLI helpers (pyfire make-promoter-fasta, pyfire make-utr3-fasta, pyfire make-utr5-fasta) to build sequence FASTA files from gene/transcript identifiers. They support three identifier modes via --entity:
gene(symbol-level; one selected major isoform per symbol)gene_id(ENSG-level; one selected major isoform per ENSG)transcript_id(ENST-level; one record per transcript)
They also write a metadata TSV describing transcript selection, scores, and any skipped records. Metadata now begins with a # command: ... header containing the exact generating command.
Download GENCODE human annotation and place it in fasta/:
mkdir -p fasta
cd fasta
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/gencode.v49.basic.annotation.gtf.gz
cd ..Example using data_examples/MDA_tx_abudance_salmon.txt and averaging MDAchr7 replicates:
python - <<'PY'
import pandas as pd
tx = pd.read_csv("data_examples/MDA_tx_abudance_salmon.txt", sep="\t")
tx["transcript_id"] = tx["Unnamed: 0"].str.replace(r"\.\d+$", "", regex=True)
tx["expression"] = tx[["MDAchr7_1", "MDAchr7_2"]].mean(axis=1)
tx[["transcript_id", "expression"]].to_csv("fasta/MDAchr7_tx_scores.tsv", sep="\t", index=False)
PYExample using DESeq2 output data_examples/MDAchr7_gene_logFC.txt:
# Optional but recommended for speed (local batch extraction)
# macOS: brew install bedtools
# Ubuntu/Debian: sudo apt-get install -y bedtools
# Promoters: 2.5 kb upstream + 0.5 kb downstream of TSS
pyfire make-promoter-fasta \
--genes data_examples/MDAchr7_gene_logFC.txt \
--column row \
--gtf fasta/gencode.v49.basic.annotation.gtf.gz \
--sequence-source genome \
--genome fasta/GRCh38.primary_assembly.genome.fa \
--entity gene \
--upstream 2500 \
--downstream 500 \
--fetch-engine bedtools \
--bed-out fasta/MDAchr7_promoters.bed \
--isoform-table fasta/MDAchr7_tx_scores.tsv \
--isoform-id-column transcript_id \
--isoform-score-column expression \
--out fasta/MDAchr7_promoters.fa
# 3'UTRs
pyfire make-utr3-fasta \
--genes data_examples/MDAchr7_gene_logFC.txt \
--column row \
--gtf fasta/gencode.v49.basic.annotation.gtf.gz \
--sequence-source genome \
--genome fasta/GRCh38.primary_assembly.genome.fa \
--entity gene \
--isoform-table fasta/MDAchr7_tx_scores.tsv \
--isoform-id-column transcript_id \
--isoform-score-column expression \
--out fasta/MDAchr7_utr3.fa
# 5'UTRs
pyfire make-utr5-fasta \
--genes data_examples/MDAchr7_gene_logFC.txt \
--column row \
--gtf fasta/gencode.v49.basic.annotation.gtf.gz \
--sequence-source genome \
--genome fasta/GRCh38.primary_assembly.genome.fa \
--entity gene \
--isoform-table fasta/MDAchr7_tx_scores.tsv \
--isoform-id-column transcript_id \
--isoform-score-column expression \
--out fasta/MDAchr7_utr5.faNotes:
- Fastest promoter extraction is local genome +
--fetch-engine bedtools(single batch call). --sequence-source ensemblavoids downloading a full genome FASTA.- For local extraction, use
--sequence-source genome --genome /path/to/genome.fa. make-promoter-fastasupports--fetch-engineand optional--bed-out; UTR commands currently use direct interval extraction.- Metadata files are written as
<out>.metadata.tsvby default. - In
--entity genemode, symbol headers are deduplicated by default (first selected record kept).
from pyfire.discovery.pipeline import FIREPipeline
from pyfire.sequence.fasta import read_fasta_with_lengths
from pyfire.io.expression import read_expression_table, align_sequences_and_expression
from pyfire.io.results import save_results
# Load data
names, sequences, lengths = read_fasta_with_lengths("promoters.fasta")
genes, expression = read_expression_table("expression.tsv")
# Align to common gene set
common_names, common_seqs, common_expr = align_sequences_and_expression(
names, sequences, genes, expression
)
# Run the pipeline
pipeline = FIREPipeline(k=7, n_top_seeds=100, n_permutations=10_000)
result = pipeline.run(common_seqs, common_expr)
# Inspect results
for motif, sig in zip(result.motifs, result.significance):
print(f"{motif.pattern} z={sig.z_score:.1f} robustness={sig.robustness}/10")
# Save
save_results(result, "results.tsv")import scanpy as sc
from pyfire.sc.discovery import SingleCellDiscoveryPipeline
# Load scRNA-seq data (with PCA + neighbors already computed)
adata = sc.read_h5ad("pbmc.h5ad")
# Run de novo motif discovery directly from single-cell data
pipeline = SingleCellDiscoveryPipeline(
k=7, groupby="celltype", n_permutations=5000,
)
result = pipeline.run(
adata,
gene_sequences=sequences, # promoter sequences for genes in adata
gene_names=gene_names,
)
# Results are stored in adata
print(adata.obsm["X_motif_cmi"].shape) # (n_cells, n_motifs) per-cell CMI
print(adata.uns["pyfire_sc_discovery"]["significant_motifs"])
# Inspect discovered motifs
for motif in result.motifs:
print(f"{motif.pattern} mean_CMI={motif.mean_cmi:.4f}")
for name, ac in zip(result.motif_names, result.autocorrelation):
print(f"{name} C'={ac.c_prime:.3f} z={ac.z_score:.1f}")import scanpy as sc
from pyfire.motif.iupac import IUPACPattern
from pyfire.sc.pipeline import SingleCellPipeline
# Load scRNA-seq data (with PCA + neighbors already computed)
adata = sc.read_h5ad("pbmc.h5ad")
# Define motifs (from bulk FIRE or known motifs)
patterns = [IUPACPattern("NACGTACGN"), IUPACPattern("NRYGAAACN")]
# Run single-cell analysis
sc_pipeline = SingleCellPipeline(n_neighbors=30, p_threshold=0.01)
result = sc_pipeline.run(
adata,
patterns=patterns,
gene_sequences=sequences, # promoter sequences for genes in adata
gene_names=gene_names,
)
# Results are stored in adata
print(adata.obsm["X_motif_activity"].shape) # (n_cells, n_motifs)
print(adata.uns["pyfire"]["significant_motifs"])
# Or access from the result object
for name, ac in zip(result.motif_names, result.autocorrelation):
print(f"{name} C'={ac.c_prime:.3f} p={ac.p_value:.4f}")from pyfire.io.anndata_bridge import adata_to_pseudobulk
# Aggregate by cell type for bulk-mode FIRE
expr_bulk, group_names, gene_names = adata_to_pseudobulk(adata, groupby="celltype")
# Then run the bulk pipeline on each column of the resulting matrixStandard FASTA file with one regulatory sequence per gene. Sequence names must match the gene identifiers in the expression file.
>YAL001C
ACGTACGTACGTNNACGT...
>YAL002W
TGCATGCATGCATGCA...
Tab-separated file with gene names in the first column and a continuous expression value in the second column. The expression value can be any continuous measure (log fold-change, mean expression, PC loading, etc.).
gene expression
YAL001C 2.34
YAL002W -1.05
YAL003W 0.72
Plain text file with one IUPAC motif per line:
ACGTRYY
WKGAAACN
TGASTCA
| Parameter | Default | Description |
|---|---|---|
k |
7 | K-mer length for initial screening |
n_top_seeds |
0 | Maximum seeds to optimize after screening (0 = uncapped; screening still stops after consecutive non-significant seeds) |
mode |
dna |
dna: both strands + canonical k-mers; rna: single-strand k-mers filtered to keep only seeds more informative than their reverse complement |
seed_permutations |
10,000 | Permutations per seed for seed-significance tests during screening |
seed_pvalue |
1e-5 | Seed p-value cutoff during screening (default enforces "better than all permutations") |
seed_max_freq |
0.33 | Maximum allowed seed abundance during screening |
max_freq |
seed_max_freq (if omitted) |
Maximum allowed motif abundance during IUPAC optimization |
seed_stop_after_failures |
5 | Stop seed screening after this many consecutive non-significant seeds |
n_expr_bins |
auto | Expression bins; auto = n_genes / (50 * n_motif_bins) |
n_length_bins |
5 | Requested sequence length bins for CMI conditioning (auto-reduced to 1 when lengths are near-constant) |
n_motif_bins |
2 | Motif count bins (2 = binary presence/absence) |
n_permutations |
10,000 | Shuffles for null distribution |
n_jackknife_trials |
10 | Jackknife resamples for robustness |
z_threshold |
4.0 | Minimum z-score to report a motif |
robustness_threshold |
3 | Minimum jackknife passes (out of 10) |
add5 / add3 |
1 / 1 | Flanking positions added during IUPAC optimization |
min_cmi_ratio |
5.0 | CMI ratio threshold for redundancy filtering |
| Parameter | Default | Description |
|---|---|---|
k |
7 | K-mer length for initial screening |
groupby |
auto | Column in adata.obs for cluster labels (tries leiden, louvain, cluster, celltype) |
n_neighbors |
30 | KNN neighbors for graph construction |
n_expr_bins |
5 | Per-cell expression bins (fewer than bulk due to sparsity) |
n_length_bins |
5 | Requested sequence-length bins for conditioning (auto-reduced/disabled when lengths are near-constant) |
n_permutations |
5,000 | Permutations for final Geary's C significance |
p_threshold |
0.01 | BH-FDR threshold for spatial/significance stages |
mode |
dna |
dna: canonical seed k-mers + both-strand motif scanning; rna: strand-specific seeds + single-strand scanning |
seed_fdr |
0.1 | Stage-1 seed BH-FDR threshold |
seed_max_freq |
0.33 | Maximum allowed seed abundance during Stage-1 |
add5 / add3 |
1 / 1 | Flanking positions added during IUPAC optimization |
max_freq |
0.5 | Strict maximum allowed motif frequency during optimization |
min_cmi_ratio |
5.0 | CMI ratio threshold for redundancy filtering |
max_cells_optimize |
5,000 | Subsample to this many cells during optimization |
One row per discovered motif:
| Column | Description |
|---|---|
motif |
IUPAC pattern string |
seed |
Original k-mer seed |
cmi |
Conditional mutual information (bits) |
frequency |
Fraction of sequences containing the motif |
degeneracy |
Number of concrete sequences the pattern matches |
z_score |
Permutation z-score |
p_value |
Empirical p-value |
corrected_p_value |
Bonferroni-corrected p-value |
robustness |
Jackknife robustness score (0–10) |
position_z |
Z-score for positional bias |
position_p |
P-value for positional bias |
orientation_z |
Z-score for orientation (strand) bias |
orientation_p |
P-value for orientation bias |
De novo discovery (discover-sc):
adata.obsm["X_motif_cmi"]—(n_cells, n_motifs)per-cell CMI matrixadata.uns["pyfire_sc_discovery"]— dict with:- motif metrics:
motif_names,seeds,mean_cmi,frequencies - spatial stats:
gearys_c,gearys_c_prime,gearys_z,gearys_p,gearys_fdr - significant set:
significant_motifs - run metadata:
k,mode,confounders,n_permutations,seed_max_freq,max_freq - diagnostics:
seed_table,stage3_table,redundancy_log
- motif metrics:
Post-hoc scoring (known motifs):
adata.obsm["X_motif_activity"]—(n_cells, n_motifs)activity matrixadata.uns["pyfire"]— dict withmotif_names,gearys_c,gearys_c_prime,gearys_z,gearys_p,significant_motifs
The discover command automatically generates figures alongside results (disable with --no-plot):
results.matrix.tsv— enrichment matrix (signed log10 p-values per motif × expression bin), used by thedrawcommandresults.pdf/results.png— FIRE matrix figure: heatmap of motif enrichment across expression bins, with sequence logos, significance borders, and statistics columnsresults.html— interactive HTML version with hover tooltips and inline base64 logosresults.motif_matches.tsv— flat report with one row per motif hit: motif, sequence name, 1-based position, strand, exact matched sequence, and sequence context with 20 bp flanks on both sidescommand.txt— exact run command plus a ready-to-runpyfire draw ...template with optional plotting flags
The discover-sc command writes a structured output directory (default: <adata_stem>_scFIRE/):
run/: run metadatacommand.txt(exact run command + ready-to-rundraw-sccommand)sc_plot_config.json<adata>.scfire.h5ad
discovery/: core discovery tablessc_motifs.tsv,sc_seed_screen.tsv,sc_stage3.tsv,sc_redundancy.tsv,sc_orientation_filter.tsvsc_logos/(motif logo SVGs used by HTML report)
matrices/: plotting/report matricessc_motif_cmi.matrix.tsv,sc_cell_metadata.tsv,sc_gene_motif_matrix.tsv
targets/: motif target summaries (sc_motif_targets.tsv)interactions/: interaction outputssc_interactions.tsv,sc_interactions.heatmap.pdf,sc_interactions.html,sc_interaction_maps/
annotation/: motif annotation/match reportssc_motif_annotations.tsv,sc_mirna_annotations.tsv,sc_motif_matches.tsv
enrichment/: GMT outputssc_gmt_motifs.tsv,sc_gmt_pairs.tsv,sc_gmt_pairs.pdf
plots/: static plot outputssc_motif_heatmap.pdf,sc_embedding_<motif>.pdf,sc_autocorrelation.pdfsc_group_enrichment_<motif>.pdf(per-motif barplots by group; includes significance)sc_group_enrichment_stats.tsv(Kruskal-Wallis + one-vs-rest Mann-Whitney p/FDR table)
sc_report.htmlat output root
sc_report.html includes:
- embedding explorer with interactive point-size and color-range controls + viridis legend
- motif table with logo, annotation, and top GMT hit
- seed diagnostics table (top 20)
- group enrichment barplot panel by selected
groupbymetadata column- metrics: high-activity fraction (
score >= motif P75) or mean motif score - one-vs-rest significance stars from BH-FDR-adjusted Mann-Whitney tests
- metrics: high-activity fraction (
Use pyfire draw to re-render figures from saved results with different visual parameters:
pyfire draw results.tsv --format pdf --lp-max 10 --no-logos --cell-width 24| Draw flag | Description |
|---|---|
--format |
Output format: pdf, png, html, or all (default) |
--matrix-file |
Override enrichment matrix path (default: <results>.matrix.tsv) |
--lp-max |
Color scale range (±value, default 20) |
--sig-threshold |
Significance threshold before Bonferroni (default 0.05) |
--cell-width / --cell-height |
Cell dimensions in points |
--logo-width |
Sequence logo column width in points |
--no-logos / --no-stats / --no-scale / --no-expression-bars |
Toggle sections |
--logo-axes / --no-logo-axes |
Show or hide numbered x-axis and y-axis (bits) for motif logos |
--logo-height-fraction |
Set logo height relative to row height (0-1] |
--logo-axes-left-pad |
Extra left padding before logo column when axes are shown (pt) |
--header-min / --header-max |
Override global min/max used by continuous header bars |
--header-font-size / --stat-font-size |
Font sizes in points |
--dpi |
PNG resolution |
Use pyfire draw-sc to re-render single-cell plots/reports without rerunning discovery:
pyfire draw-sc pbmc_scFIRE --groupby celltype --format allpyFIRE uses the standard IUPAC nucleotide ambiguity codes:
| Code | Bases | Code | Bases |
|---|---|---|---|
| A | A | M | A, C |
| C | C | R | A, G |
| G | G | W | A, T |
| T | T | S | C, G |
| K | G, T | Y | C, T |
| V | A, C, G | H | A, C, T |
| D | A, G, T | B | C, G, T |
| N | A, C, G, T |
During optimization, the algorithm tries all 15 codes at each position, keeping the change that maximizes CMI. Positions derived from the original k-mer seed are constrained to contain the original nucleotide.
src/pyfire/
├── core/ # Computational engines (numba-accelerated)
│ ├── information.py # MI, CMI, batch MI/CMI
│ ├── information_percell.py # Per-cell CMI for single-cell discovery
│ ├── quantization.py # Equal-population binning (bulk + per-cell)
│ └── permutation.py # Permutation null distributions, jackknife
├── motif/ # Motif representations
│ ├── iupac.py # IUPACPattern class
│ ├── kmer.py # 2-bit k-mer encoding and counting
│ └── pwm.py # Position weight matrices
├── sequence/ # Sequence handling
│ ├── fasta.py # FASTA reading (pyfaidx)
│ └── scanner.py # Motif occurrence scanning
├── discovery/ # Bulk FIRE pipeline
│ ├── screen.py # K-mer CMI screening
│ ├── optimize.py # IUPAC greedy optimization
│ ├── significance.py # Permutation + jackknife testing
│ ├── distribution.py # Position/orientation bias
│ ├── combine.py # Motif-motif interactions
│ └── pipeline.py # End-to-end orchestrator
├── sc/ # Single-cell mode
│ ├── discovery.py # De novo SC motif discovery (MI/CMI pipeline)
│ ├── scoring.py # Per-cell motif activity (sparse-aware)
│ ├── neighbors.py # KNN graph construction
│ ├── autocorrelation.py # Geary's C on KNN graph
│ ├── accessibility.py # scATAC (placeholder)
│ └── pipeline.py # Post-hoc SC scoring orchestrator
├── io/ # I/O utilities
│ ├── expression.py # Expression table reading
│ ├── anndata_bridge.py # AnnData integration
│ └── results.py # Result export + enrichment matrix load/save
├── viz/ # Visualization
│ ├── config.py # PlotConfig dataclass (maps to CLI flags)
│ ├── _colors.py # Blue→Black→Yellow FIRE diverging colormap
│ ├── _logo.py # IC-weighted sequence logos (matplotlib TextToPath)
│ ├── _layout.py # MatrixLayout grid engine
│ ├── matrix.py # EnrichmentMatrix + FIREMatrixPlot (PDF/PNG)
│ ├── html.py # FIREMatrixHTML (self-contained interactive HTML)
│ └── matrix_sc.py # SC heatmap, UMAP embedding, autocorrelation bars
└── cli/
└── main.py # Click CLI (discover, scan, discover-sc, draw)
pytest tests/ -v165 tests covering MI/CMI mathematical properties (including per-cell CMI), IUPAC pattern operations, k-mer encoding round-trips, motif scanning, the full bulk discovery pipeline (with planted-motif recovery and no-signal controls), single-cell de novo discovery (metacell screening, per-cell CMI, optimizer, end-to-end pipeline with planted motif), single-cell scoring, Geary's C autocorrelation, visualization (color interpolation, sequence logos, matrix plots, HTML output, single-cell plots), and CLI commands.
- Elemento, O., Slonim, N. & Tavazoie, S. A universal framework for regulatory element discovery across all genomes and data types. Molecular Cell 28, 337–350 (2007).
- DeTomaso, D. & Yosef, N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Systems 12, 446–456 (2021).
MIT