A command-line tool to predict the efficacy of small interfering RNAs (siRNAs). This tool is the official implementation of the model described in the paper:
Coffey, R. (2025). Systematic feature and architecture evaluation reveals tokenized learned embeddings enhance siRNA efficacy prediction. bioRxiv [Preprint]. https://doi.org/10.1101/2025.08.12.669916
RN.Ai-Predict can generate and evaluate all possible siRNAs for a target in two ways:
- From a FASTA file: Provide your own mRNA sequence.
- By gene name: Provide a gene name (e.g., "HIF1A"), and the tool will automatically download the necessary transcript data from Ensembl.
This tool takes either an mRNA sequence or a gene name and generates a list of all possible siRNAs of a specified length (19, 20, or 21 nt). For each potential siRNA, it predicts the silencing efficacy and provides a percentile rank relative to all other siRNAs for that gene.
The core of the prediction is a model trained on curated public datasets, using a gene-based cross-validation strategy to ensure robust performance on unseen gene targets.
- uv: The primary tool for managing Python dependencies in this project. Please install it before proceeding.
git clone https://github.com/Roco-scientist/RN.Ai-Predict.git
cd RN.Ai-PredictUsing a virtual environment is highly recommended.
# Install required packages
uv sync
source .venv/bin/activateThe script predict.py requires exactly one of two input modes: predicting from a local FASTA file (--mrna_fasta) or predicting directly from a gene name (--gene).
-
Required (choose one):
--mrna_fasta <path>: Path to the input FASTA file containing one or more mRNA sequences.--gene <GENE_NAME>: The official gene symbol (e.g.,HIF1A,GAPDH) for which to predict siRNAs. Currently supports human genes.
-
Optional:
--size <int>: The length of the siRNAs to generate. Choices are19,20, or21. Defaults to21.--model <path>: Path to the pre-trained.kerasmodel file. Defaults to./RN.Ai-predict.model.keras.--db_dir <path>: Directory to store large, downloaded Ensembl data files. Defaults to./db.
This mode is useful when you have a specific mRNA sequence you want to analyze.
Example:
python predict.py --mrna_fasta ./example_data/hif1a.fasta --size 21This is the easiest way to analyze a human gene. It automatically fetches the latest transcript data from Ensembl.
First-Time Setup:
The first time you run a prediction using --gene, the script will download the necessary genome (cDNA) and annotation files from Ensembl.
- By default, these files will be stored in a
./dbdirectory. You can change this location with the--db_dirargument. - This download can take several minutes and may require a few gigabytes of disk space. This is a one-time process for each species/dataset.
Interactive Transcript Selection: After the data is available, you will be presented with an interactive menu in your terminal. This allows you to select the specific transcript(s) you want to target for siRNA design. Use the arrow keys and Enter/Space to make your selections.
Example:
# Predict for the HIF1A gene using default settings
python predict.py --gene HIF1A
# Predict for the GAPDH gene and store Ensembl data in a different directory
python predict.py --gene GAPDH --db_dir /path/to/my/databases/The script generates a CSV file with prediction results.
- For FASTA input, the output is
<your_fasta_filename>.sirna_prediction.csv. - For gene input, the output is
<GENE_NAME>.sirna_prediction.csv.
The CSV file contains the following columns:
| Column Name | Description |
|---|---|
siRNA_ID |
A unique identifier for the siRNA, based on its position in the mRNA. |
siRNA_Sense |
The sequence of the siRNA sense strand. |
siRNA_Antisense |
The sequence of the siRNA antisense strand (complementary to the mRNA). |
Efficacy_Prediction |
The predicted knockdown efficacy percentage (0-100). |
Rank_Order |
The rank of the siRNA's efficacy among all siRNAs for that gene. |
If you use this tool or the RN.Ai-Predict model in your research, please cite my paper:
@article{Coffey2025.08.12.669916,
author = {Coffey, Rory},
title = {Systematic feature and architecture evaluation reveals tokenized learned embeddings enhance siRNA efficacy prediction},
journal = {bioRxiv},
year = {2025},
doi = {10.1101/2025.08.12.669916},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/08/15/2025.08.12.669916}
}This project is licensed under the terms of the LICENSE file.