Skip to content

goodarzilab/pyFIRE

Repository files navigation

pyFIRE

License: MIT Python >=3.10 Tests

Finding Informative Regulatory Elements — a Python implementation with single-cell support.

pyFIRE is a ground-up rewrite of FIRE (Elemento, Slonim & Tavazoie, Molecular Cell 2007), an information-theoretic framework for de novo discovery of DNA/RNA regulatory motifs. This version adds native single-cell capabilities inspired by VISION (DeTomaso & Yosef, Nature Communications 2019), integrates with the scverse ecosystem (scanpy / AnnData), and is pip-installable with numba-accelerated internals.

How it works

FIRE discovers motifs whose presence in regulatory sequences (promoters, UTRs, enhancers) is statistically associated with gene expression patterns. It uses mutual information as a non-parametric, assumption-free measure of dependence — capturing linear, nonlinear, and combinatorial relationships that correlation-based methods miss.

Bulk mode

Sequences + Expression
        │
        ▼
  ┌──────────────┐
  │ K-mer        │
  │ Screening    │
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │ IUPAC        │
  │ Optimization │
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │ Significance │
  │ Testing      │
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │ Distribution │
  │ & Combine    │
  └──────┬───────┘
         ▼
     FIREResult

K-mer Screening: Screen all 4^k k-mers by CMI(kmer; expression | length), then keep candidate seeds.
IUPAC Optimization: Greedily expand seeds with degenerate IUPAC codes to maximize CMI; filter redundant motifs.
Significance Testing: Permutation testing (z-scores, p-values) + jackknife robustness.
Distribution & Combine: Test positional/orientation bias and detect synergistic or redundant motif pairs.

Single-cell mode (scRNA-seq)

pyFIRE supports two single-cell workflows:

De novo discovery — discover motifs directly from scRNA-seq data using per-cell conditional mutual information. For each cell c and motif m, the score is CMI(motif_presence; expression_cell | length) across all genes. This produces a (n_cells, n_motifs) MI matrix.

AnnData + Sequences
        │
        ▼
  ┌──────────────┐
  │ Metacell     │
  │ Pre-screen   │
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │ Per-cell     │
  │ CMI Matrix   │
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │ Spatial      │
  │ Coherence    │
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │ IUPAC        │
  │ Optimization │
  └──────┬───────┘
         ▼
  ┌──────────────┐
  │ Significance │
  │ Testing      │
  └──────┬───────┘
         ▼
  SingleCellDiscoveryResult

Metacell Pre-screen: Per-cluster pseudo-bulk CMI seed screen with permutation p-values and BH-FDR thresholding.
Per-cell CMI Matrix: Compute CMI(kmer; expression_cell | confounders) for each cell × candidate.
Spatial Coherence: Geary's C on the KNN graph with permutation p-values and BH-FDR thresholding.
IUPAC Optimization: Greedy motif optimization with strict max-frequency cap and CMI/MI redundancy filtering.
Significance Testing: Final Geary's C testing with full permutations and BH-FDR correction.

Post-hoc scoring — given motifs from a bulk run (or supplied externally), score per-cell motif activity as the weighted mean expression of each motif's target genes, then test for spatial coherence on a KNN graph using Geary's C autocorrelation. Motifs whose activity is smoothly distributed across transcriptionally similar cells are reported as significant.

Results are stored directly in AnnData objects.

Length bias correction

Sequences of different lengths have different probabilities of containing any given motif by chance. pyFIRE corrects for this by computing conditional mutual informationCMI(motif; expression | length) — throughout the pipeline, rather than raw MI. This conditions out the confounding effect of length while keeping motif counts as simple binary presence/absence (no density normalization needed).

In both bulk and single-cell modes, requested length correction is automatically disabled when sequence lengths are effectively constant (e.g., fixed-width promoter windows). In that case, pyFIRE falls back to a single conditioning state instead of over-correcting.

Installation

pip install -e "."

With single-cell support (anndata + scanpy):

pip install -e ".[singlecell]"

Development (includes pytest, ruff):

pip install -e ".[dev]"

Requirements

  • Python >= 3.10
  • numpy, scipy, pandas, numba, pyfaidx, click, tqdm, matplotlib
  • Optional: anndata, scanpy (for single-cell mode)

Quick start

Command line

# Discover motifs from regulatory sequences + expression data
# Default outputs go to <expression>_FIRE_<mode>/
# (e.g., expression.tsv + mode=dna -> expression_FIRE_dna/pyfire_results_dna.tsv + plots/matrix files)
pyfire discover --fasta promoters.fasta -e expression.tsv

# RNA mode (single-strand scanning + RC-informativeness seed filter)
pyfire discover --fasta utr3.fasta -e expression.tsv --mode rna

# With custom parameters
pyfire discover --fasta promoters.fasta -e expression.tsv \
    -k 7 \
    --n-seeds 200 \
    --seed-permutations 200 \
    --seed-pvalue 1e-5 \
    # optional alternative rule:
    # --seed-z-threshold 3.0 \
    --seed-max-freq 0.33 \
    # optional override for optimization-stage cap:
    # --max-freq 0.33 \
    --seed-stop-failures 5 \
    --n-permutations 10000 \
    --z-threshold 4.0 \
    --robustness-threshold 3 \
    --output-dir runs/run1

# Select expression columns directly from a multi-column table
# (supports names or 1-based indices)
pyfire discover --fasta promoters.fasta -e data_examples/test_DESeq_logFC.txt.gz \
    --cols GENE,log2FoldChange
# equivalent: --cols 8,2

# Disable automatic plotting
pyfire discover --fasta promoters.fasta -e expression.tsv --no-plot

# Scan sequences for known motifs
pyfire scan sequences.fasta motifs.txt -o scan_results.tsv

# Single-cell de novo discovery (with plots)
pyfire discover-sc pbmc.h5ad promoters.fasta \
    --groupby celltype \
    -k 7 \
    --mode dna \
    --seed-fdr 0.1 \
    --seed-max-freq 0.33 \
    --max-freq 0.5 \
    --n-permutations 5000 \
    -o pbmc_results.h5ad

# Re-plot from saved results without re-running discovery
pyfire draw results.tsv --format pdf --no-logos --lp-max 10

# Re-draw single-cell outputs without re-running discovery
pyfire draw-sc pbmc_scFIRE --groupby celltype --format all

Extract promoter/UTR FASTA

Use the CLI helpers (pyfire make-promoter-fasta, pyfire make-utr3-fasta, pyfire make-utr5-fasta) to build sequence FASTA files from gene/transcript identifiers. They support three identifier modes via --entity:

  • gene (symbol-level; one selected major isoform per symbol)
  • gene_id (ENSG-level; one selected major isoform per ENSG)
  • transcript_id (ENST-level; one record per transcript)

They also write a metadata TSV describing transcript selection, scores, and any skipped records. Metadata now begins with a # command: ... header containing the exact generating command.

1) Get annotation

Download GENCODE human annotation and place it in fasta/:

mkdir -p fasta
cd fasta
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/gencode.v49.basic.annotation.gtf.gz
cd ..

2) (Optional) Build transcript score table for isoform-aware selection

Example using data_examples/MDA_tx_abudance_salmon.txt and averaging MDAchr7 replicates:

python - <<'PY'
import pandas as pd
tx = pd.read_csv("data_examples/MDA_tx_abudance_salmon.txt", sep="\t")
tx["transcript_id"] = tx["Unnamed: 0"].str.replace(r"\.\d+$", "", regex=True)
tx["expression"] = tx[["MDAchr7_1", "MDAchr7_2"]].mean(axis=1)
tx[["transcript_id", "expression"]].to_csv("fasta/MDAchr7_tx_scores.tsv", sep="\t", index=False)
PY

3) Build promoters, 3'UTRs, and 5'UTRs

Example using DESeq2 output data_examples/MDAchr7_gene_logFC.txt:

# Optional but recommended for speed (local batch extraction)
# macOS: brew install bedtools
# Ubuntu/Debian: sudo apt-get install -y bedtools

# Promoters: 2.5 kb upstream + 0.5 kb downstream of TSS
pyfire make-promoter-fasta \
  --genes data_examples/MDAchr7_gene_logFC.txt \
  --column row \
  --gtf fasta/gencode.v49.basic.annotation.gtf.gz \
  --sequence-source genome \
  --genome fasta/GRCh38.primary_assembly.genome.fa \
  --entity gene \
  --upstream 2500 \
  --downstream 500 \
  --fetch-engine bedtools \
  --bed-out fasta/MDAchr7_promoters.bed \
  --isoform-table fasta/MDAchr7_tx_scores.tsv \
  --isoform-id-column transcript_id \
  --isoform-score-column expression \
  --out fasta/MDAchr7_promoters.fa

# 3'UTRs
pyfire make-utr3-fasta \
  --genes data_examples/MDAchr7_gene_logFC.txt \
  --column row \
  --gtf fasta/gencode.v49.basic.annotation.gtf.gz \
  --sequence-source genome \
  --genome fasta/GRCh38.primary_assembly.genome.fa \
  --entity gene \
  --isoform-table fasta/MDAchr7_tx_scores.tsv \
  --isoform-id-column transcript_id \
  --isoform-score-column expression \
  --out fasta/MDAchr7_utr3.fa

# 5'UTRs
pyfire make-utr5-fasta \
  --genes data_examples/MDAchr7_gene_logFC.txt \
  --column row \
  --gtf fasta/gencode.v49.basic.annotation.gtf.gz \
  --sequence-source genome \
  --genome fasta/GRCh38.primary_assembly.genome.fa \
  --entity gene \
  --isoform-table fasta/MDAchr7_tx_scores.tsv \
  --isoform-id-column transcript_id \
  --isoform-score-column expression \
  --out fasta/MDAchr7_utr5.fa

Notes:

  • Fastest promoter extraction is local genome + --fetch-engine bedtools (single batch call).
  • --sequence-source ensembl avoids downloading a full genome FASTA.
  • For local extraction, use --sequence-source genome --genome /path/to/genome.fa.
  • make-promoter-fasta supports --fetch-engine and optional --bed-out; UTR commands currently use direct interval extraction.
  • Metadata files are written as <out>.metadata.tsv by default.
  • In --entity gene mode, symbol headers are deduplicated by default (first selected record kept).

Python API — bulk discovery

from pyfire.discovery.pipeline import FIREPipeline
from pyfire.sequence.fasta import read_fasta_with_lengths
from pyfire.io.expression import read_expression_table, align_sequences_and_expression
from pyfire.io.results import save_results

# Load data
names, sequences, lengths = read_fasta_with_lengths("promoters.fasta")
genes, expression = read_expression_table("expression.tsv")

# Align to common gene set
common_names, common_seqs, common_expr = align_sequences_and_expression(
    names, sequences, genes, expression
)

# Run the pipeline
pipeline = FIREPipeline(k=7, n_top_seeds=100, n_permutations=10_000)
result = pipeline.run(common_seqs, common_expr)

# Inspect results
for motif, sig in zip(result.motifs, result.significance):
    print(f"{motif.pattern}  z={sig.z_score:.1f}  robustness={sig.robustness}/10")

# Save
save_results(result, "results.tsv")

Python API — single-cell de novo discovery

import scanpy as sc
from pyfire.sc.discovery import SingleCellDiscoveryPipeline

# Load scRNA-seq data (with PCA + neighbors already computed)
adata = sc.read_h5ad("pbmc.h5ad")

# Run de novo motif discovery directly from single-cell data
pipeline = SingleCellDiscoveryPipeline(
    k=7, groupby="celltype", n_permutations=5000,
)
result = pipeline.run(
    adata,
    gene_sequences=sequences,      # promoter sequences for genes in adata
    gene_names=gene_names,
)

# Results are stored in adata
print(adata.obsm["X_motif_cmi"].shape)  # (n_cells, n_motifs) per-cell CMI
print(adata.uns["pyfire_sc_discovery"]["significant_motifs"])

# Inspect discovered motifs
for motif in result.motifs:
    print(f"{motif.pattern}  mean_CMI={motif.mean_cmi:.4f}")
for name, ac in zip(result.motif_names, result.autocorrelation):
    print(f"{name}  C'={ac.c_prime:.3f}  z={ac.z_score:.1f}")

Python API — single-cell post-hoc scoring

import scanpy as sc
from pyfire.motif.iupac import IUPACPattern
from pyfire.sc.pipeline import SingleCellPipeline

# Load scRNA-seq data (with PCA + neighbors already computed)
adata = sc.read_h5ad("pbmc.h5ad")

# Define motifs (from bulk FIRE or known motifs)
patterns = [IUPACPattern("NACGTACGN"), IUPACPattern("NRYGAAACN")]

# Run single-cell analysis
sc_pipeline = SingleCellPipeline(n_neighbors=30, p_threshold=0.01)
result = sc_pipeline.run(
    adata,
    patterns=patterns,
    gene_sequences=sequences,      # promoter sequences for genes in adata
    gene_names=gene_names,
)

# Results are stored in adata
print(adata.obsm["X_motif_activity"].shape)  # (n_cells, n_motifs)
print(adata.uns["pyfire"]["significant_motifs"])

# Or access from the result object
for name, ac in zip(result.motif_names, result.autocorrelation):
    print(f"{name}  C'={ac.c_prime:.3f}  p={ac.p_value:.4f}")

Pseudo-bulk from single-cell data

from pyfire.io.anndata_bridge import adata_to_pseudobulk

# Aggregate by cell type for bulk-mode FIRE
expr_bulk, group_names, gene_names = adata_to_pseudobulk(adata, groupby="celltype")

# Then run the bulk pipeline on each column of the resulting matrix

Input formats

Sequences (FASTA)

Standard FASTA file with one regulatory sequence per gene. Sequence names must match the gene identifiers in the expression file.

>YAL001C
ACGTACGTACGTNNACGT...
>YAL002W
TGCATGCATGCATGCA...

Expression (TSV)

Tab-separated file with gene names in the first column and a continuous expression value in the second column. The expression value can be any continuous measure (log fold-change, mean expression, PC loading, etc.).

gene    expression
YAL001C 2.34
YAL002W -1.05
YAL003W 0.72

Motifs (for scanning)

Plain text file with one IUPAC motif per line:

ACGTRYY
WKGAAACN
TGASTCA

Pipeline parameters

Bulk (discover)

Parameter Default Description
k 7 K-mer length for initial screening
n_top_seeds 0 Maximum seeds to optimize after screening (0 = uncapped; screening still stops after consecutive non-significant seeds)
mode dna dna: both strands + canonical k-mers; rna: single-strand k-mers filtered to keep only seeds more informative than their reverse complement
seed_permutations 10,000 Permutations per seed for seed-significance tests during screening
seed_pvalue 1e-5 Seed p-value cutoff during screening (default enforces "better than all permutations")
seed_max_freq 0.33 Maximum allowed seed abundance during screening
max_freq seed_max_freq (if omitted) Maximum allowed motif abundance during IUPAC optimization
seed_stop_after_failures 5 Stop seed screening after this many consecutive non-significant seeds
n_expr_bins auto Expression bins; auto = n_genes / (50 * n_motif_bins)
n_length_bins 5 Requested sequence length bins for CMI conditioning (auto-reduced to 1 when lengths are near-constant)
n_motif_bins 2 Motif count bins (2 = binary presence/absence)
n_permutations 10,000 Shuffles for null distribution
n_jackknife_trials 10 Jackknife resamples for robustness
z_threshold 4.0 Minimum z-score to report a motif
robustness_threshold 3 Minimum jackknife passes (out of 10)
add5 / add3 1 / 1 Flanking positions added during IUPAC optimization
min_cmi_ratio 5.0 CMI ratio threshold for redundancy filtering

Single-cell (discover-sc)

Parameter Default Description
k 7 K-mer length for initial screening
groupby auto Column in adata.obs for cluster labels (tries leiden, louvain, cluster, celltype)
n_neighbors 30 KNN neighbors for graph construction
n_expr_bins 5 Per-cell expression bins (fewer than bulk due to sparsity)
n_length_bins 5 Requested sequence-length bins for conditioning (auto-reduced/disabled when lengths are near-constant)
n_permutations 5,000 Permutations for final Geary's C significance
p_threshold 0.01 BH-FDR threshold for spatial/significance stages
mode dna dna: canonical seed k-mers + both-strand motif scanning; rna: strand-specific seeds + single-strand scanning
seed_fdr 0.1 Stage-1 seed BH-FDR threshold
seed_max_freq 0.33 Maximum allowed seed abundance during Stage-1
add5 / add3 1 / 1 Flanking positions added during IUPAC optimization
max_freq 0.5 Strict maximum allowed motif frequency during optimization
min_cmi_ratio 5.0 CMI ratio threshold for redundancy filtering
max_cells_optimize 5,000 Subsample to this many cells during optimization

Output

Results TSV

One row per discovered motif:

Column Description
motif IUPAC pattern string
seed Original k-mer seed
cmi Conditional mutual information (bits)
frequency Fraction of sequences containing the motif
degeneracy Number of concrete sequences the pattern matches
z_score Permutation z-score
p_value Empirical p-value
corrected_p_value Bonferroni-corrected p-value
robustness Jackknife robustness score (0–10)
position_z Z-score for positional bias
position_p P-value for positional bias
orientation_z Z-score for orientation (strand) bias
orientation_p P-value for orientation bias

Single-cell results (in AnnData)

De novo discovery (discover-sc):

  • adata.obsm["X_motif_cmi"](n_cells, n_motifs) per-cell CMI matrix
  • adata.uns["pyfire_sc_discovery"] — dict with:
    • motif metrics: motif_names, seeds, mean_cmi, frequencies
    • spatial stats: gearys_c, gearys_c_prime, gearys_z, gearys_p, gearys_fdr
    • significant set: significant_motifs
    • run metadata: k, mode, confounders, n_permutations, seed_max_freq, max_freq
    • diagnostics: seed_table, stage3_table, redundancy_log

Post-hoc scoring (known motifs):

  • adata.obsm["X_motif_activity"](n_cells, n_motifs) activity matrix
  • adata.uns["pyfire"] — dict with motif_names, gearys_c, gearys_c_prime, gearys_z, gearys_p, significant_motifs

Visualization

The discover command automatically generates figures alongside results (disable with --no-plot):

  • results.matrix.tsv — enrichment matrix (signed log10 p-values per motif × expression bin), used by the draw command
  • results.pdf / results.png — FIRE matrix figure: heatmap of motif enrichment across expression bins, with sequence logos, significance borders, and statistics columns
  • results.html — interactive HTML version with hover tooltips and inline base64 logos
  • results.motif_matches.tsv — flat report with one row per motif hit: motif, sequence name, 1-based position, strand, exact matched sequence, and sequence context with 20 bp flanks on both sides
  • command.txt — exact run command plus a ready-to-run pyfire draw ... template with optional plotting flags

The discover-sc command writes a structured output directory (default: <adata_stem>_scFIRE/):

  • run/: run metadata
    • command.txt (exact run command + ready-to-run draw-sc command)
    • sc_plot_config.json
    • <adata>.scfire.h5ad
  • discovery/: core discovery tables
    • sc_motifs.tsv, sc_seed_screen.tsv, sc_stage3.tsv, sc_redundancy.tsv, sc_orientation_filter.tsv
    • sc_logos/ (motif logo SVGs used by HTML report)
  • matrices/: plotting/report matrices
    • sc_motif_cmi.matrix.tsv, sc_cell_metadata.tsv, sc_gene_motif_matrix.tsv
  • targets/: motif target summaries (sc_motif_targets.tsv)
  • interactions/: interaction outputs
    • sc_interactions.tsv, sc_interactions.heatmap.pdf, sc_interactions.html, sc_interaction_maps/
  • annotation/: motif annotation/match reports
    • sc_motif_annotations.tsv, sc_mirna_annotations.tsv, sc_motif_matches.tsv
  • enrichment/: GMT outputs
    • sc_gmt_motifs.tsv, sc_gmt_pairs.tsv, sc_gmt_pairs.pdf
  • plots/: static plot outputs
    • sc_motif_heatmap.pdf, sc_embedding_<motif>.pdf, sc_autocorrelation.pdf
    • sc_group_enrichment_<motif>.pdf (per-motif barplots by group; includes significance)
    • sc_group_enrichment_stats.tsv (Kruskal-Wallis + one-vs-rest Mann-Whitney p/FDR table)
  • sc_report.html at output root

sc_report.html includes:

  • embedding explorer with interactive point-size and color-range controls + viridis legend
  • motif table with logo, annotation, and top GMT hit
  • seed diagnostics table (top 20)
  • group enrichment barplot panel by selected groupby metadata column
    • metrics: high-activity fraction (score >= motif P75) or mean motif score
    • one-vs-rest significance stars from BH-FDR-adjusted Mann-Whitney tests

Use pyfire draw to re-render figures from saved results with different visual parameters:

pyfire draw results.tsv --format pdf --lp-max 10 --no-logos --cell-width 24
Draw flag Description
--format Output format: pdf, png, html, or all (default)
--matrix-file Override enrichment matrix path (default: <results>.matrix.tsv)
--lp-max Color scale range (±value, default 20)
--sig-threshold Significance threshold before Bonferroni (default 0.05)
--cell-width / --cell-height Cell dimensions in points
--logo-width Sequence logo column width in points
--no-logos / --no-stats / --no-scale / --no-expression-bars Toggle sections
--logo-axes / --no-logo-axes Show or hide numbered x-axis and y-axis (bits) for motif logos
--logo-height-fraction Set logo height relative to row height (0-1]
--logo-axes-left-pad Extra left padding before logo column when axes are shown (pt)
--header-min / --header-max Override global min/max used by continuous header bars
--header-font-size / --stat-font-size Font sizes in points
--dpi PNG resolution

Use pyfire draw-sc to re-render single-cell plots/reports without rerunning discovery:

pyfire draw-sc pbmc_scFIRE --groupby celltype --format all

IUPAC codes

pyFIRE uses the standard IUPAC nucleotide ambiguity codes:

Code Bases Code Bases
A A M A, C
C C R A, G
G G W A, T
T T S C, G
K G, T Y C, T
V A, C, G H A, C, T
D A, G, T B C, G, T
N A, C, G, T

During optimization, the algorithm tries all 15 codes at each position, keeping the change that maximizes CMI. Positions derived from the original k-mer seed are constrained to contain the original nucleotide.

Architecture

src/pyfire/
├── core/                     # Computational engines (numba-accelerated)
│   ├── information.py        # MI, CMI, batch MI/CMI
│   ├── information_percell.py # Per-cell CMI for single-cell discovery
│   ├── quantization.py       # Equal-population binning (bulk + per-cell)
│   └── permutation.py        # Permutation null distributions, jackknife
├── motif/                    # Motif representations
│   ├── iupac.py              # IUPACPattern class
│   ├── kmer.py               # 2-bit k-mer encoding and counting
│   └── pwm.py                # Position weight matrices
├── sequence/                 # Sequence handling
│   ├── fasta.py              # FASTA reading (pyfaidx)
│   └── scanner.py            # Motif occurrence scanning
├── discovery/                # Bulk FIRE pipeline
│   ├── screen.py             # K-mer CMI screening
│   ├── optimize.py           # IUPAC greedy optimization
│   ├── significance.py       # Permutation + jackknife testing
│   ├── distribution.py       # Position/orientation bias
│   ├── combine.py            # Motif-motif interactions
│   └── pipeline.py           # End-to-end orchestrator
├── sc/                       # Single-cell mode
│   ├── discovery.py          # De novo SC motif discovery (MI/CMI pipeline)
│   ├── scoring.py            # Per-cell motif activity (sparse-aware)
│   ├── neighbors.py          # KNN graph construction
│   ├── autocorrelation.py    # Geary's C on KNN graph
│   ├── accessibility.py      # scATAC (placeholder)
│   └── pipeline.py           # Post-hoc SC scoring orchestrator
├── io/                       # I/O utilities
│   ├── expression.py         # Expression table reading
│   ├── anndata_bridge.py     # AnnData integration
│   └── results.py            # Result export + enrichment matrix load/save
├── viz/                      # Visualization
│   ├── config.py             # PlotConfig dataclass (maps to CLI flags)
│   ├── _colors.py            # Blue→Black→Yellow FIRE diverging colormap
│   ├── _logo.py              # IC-weighted sequence logos (matplotlib TextToPath)
│   ├── _layout.py            # MatrixLayout grid engine
│   ├── matrix.py             # EnrichmentMatrix + FIREMatrixPlot (PDF/PNG)
│   ├── html.py               # FIREMatrixHTML (self-contained interactive HTML)
│   └── matrix_sc.py          # SC heatmap, UMAP embedding, autocorrelation bars
└── cli/
    └── main.py               # Click CLI (discover, scan, discover-sc, draw)

Testing

pytest tests/ -v

165 tests covering MI/CMI mathematical properties (including per-cell CMI), IUPAC pattern operations, k-mer encoding round-trips, motif scanning, the full bulk discovery pipeline (with planted-motif recovery and no-signal controls), single-cell de novo discovery (metacell screening, per-cell CMI, optimizer, end-to-end pipeline with planted motif), single-cell scoring, Geary's C autocorrelation, visualization (color interpolation, sequence logos, matrix plots, HTML output, single-cell plots), and CLI commands.

References

  • Elemento, O., Slonim, N. & Tavazoie, S. A universal framework for regulatory element discovery across all genomes and data types. Molecular Cell 28, 337–350 (2007).
  • DeTomaso, D. & Yosef, N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Systems 12, 446–456 (2021).

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors