tRNA ionic current model and alignment tools
To install the repository, refer to the following instructions:
git clone https://github.com/genometechlab/tRNA_zap.git
cd tRNA_zap
pip install -e .The Splitter Module enables classification and segmentation of nanopore ionic current signals into biologically relevant regions.
To use the module, 1) download a model and its configuration from the available models section and 2) refer to the example to learn how to use the module
| Model Name | Config File | Model Weights | Description |
|---|---|---|---|
zap_s54_c127 |
zap_s54_c127.yaml |
zap_s54_c127.pth |
Standard classifier trained on yeast and ecoli |
zap_s54_c49_IVTecoli |
zap_s54_c49_IVTecoli.yaml |
zap_s54_c49_IVTecoli.pth |
Standard classifier trained on ecoli only |
Please download one of the specified models and its config file from the table above and place them in the following structure:
<your_project_root>/
├── configs/
│ └── <model_config>.yaml
├── checkpoints/
│ └── <model_weights>.pt
├── your_script.py
└── ...
💡 Tip: The path to the model weights file is specified inside the YAML configuration (
checkpoint_path). If you want to move the weights to a different directory, be sure to update that path in the config file accordingly.
The inference module will ran the model on each of the read_ids
It will classify the read, as part of the classification task,
and will segment the read into variable region, ONT adapter, 3' and 5' splint regions
# From the splitter module, import Inference and ResultsVisualizer classes
from trnazap.splitter import Inference, ResultsVisualizer
# Specify your pod5 paths. This can be a single file or a list of directories
pod5_pth = ['Path/To/pod5/file', 'Path/to/pod5/dir1', 'Path/to/pod5/dir2', ...]
# Specify the reads you want to run inference on
desired_reads = [...] # A list of read IDs as strings,
# Load inference engine from a configuration file
config_pth = "/patch/to/config.yaml"
# Device used for inference. For fast inference, use a cuda-enabled GPU.
#Can use cpu for a small number of read IDs
device = "cuda"
infer_engine = Inference(config_pth, device=device)
# Run inference
results = infer_engine.predict(
pod5_paths=pod5_pth,
read_ids=desired_reads, # if not provided, will perform inference on every read ID in pod5s
batch_size=2048, # Number of read IDs to be processed in one batch
)You will get an InferenceResults object as the return value of Inference.predict(...)
An explanation on how to use and interact with this isntance is provided below
The InferenceResults object is a lightweight container that stores all outputs from an inference run, indexed by read ID. It also includes relevant metadata and supports basic persistence and inspection.
results = infer_engine.predict(...)-
Returns the inference result for a specific read ID. Raises
KeyErrorif not found.read_result = results["read_abc123"]
To Check if a read is present
if "read_abc123" in results: ...
-
Returns a list of all read IDs in the result set.
all_ids = results.read_ids
-
Saves the full results object to a
.pklfile.results.save("/path/to/save/results.pkl")
-
Loads a previously saved results object from disk.
results = InferenceResults.load("/path/to/saved/results.pkl")
-
Returns a dictionary with summary statistics about the inference run:
- number of reads
- total chunks
- chunk size
- model type
- device
- inference timestamp
- total inference time
summary = results.summary()
-
Returns a mappind of label indices to class names
summary = results.label_names
Each value corresponding to a read_id key in InferenceResults is a ReadResult object. It stores the model outputs for a single read. Probabilities and predictions for both sequence-level and read-level tasks can be accessed directly from this object.
You do not need to create this class manually — it is returned when you access a read ID from InferenceResults:
read_result = inference_results["read_id"]-
Predicted class indices for each chunk in the read (from the seq2seq task).
chunk_predictions = read_result.seq2seq_preds
-
Predicted class index for the whole read.
💡 Tip: If you would prefer to get the exact class names instead of the class label index, use the label_names from the InferenceResults instance
label_index = read_result.classification_pred cls_name = results.label_names[label] # Return the class name
-
Softmax probabilities for each class at each chunk position (seq2seq task).
probs = read_result.seq2seq_probs
-
Softmax probabilities for the read-level classification task.
probs = read_result.classification_probs
-
Dictionary containing predictions for both tasks.
{ "seq_class": read_result.classification_pred, "seq2seq": read_result.seq2seq_preds } -
Dictionary containing probability outputs for both tasks.
{ "seq_class": read_result.classification_probs, "seq2seq": read_result.seq2seq_probs } -
Start and end indices (inclusive) of the predicted variable region in the signal. Returns (-1, -1) if no region is found.
start, end = read_result.variable_region_range
-
Returns smoothed seq2seq predictions using CRF-based smoothing (if available).
Parameters:
- device (default: 'cpu'): Device to run the CRF on
- return_variable_region_range (default: False): Whether to also return (start, end) range
smoothed_preds = read_result.get_smoothed_seq2seq_preds() # or with variable region range: smoothed_preds, (start, end) = read_result.get_smoothed_seq2seq_preds(return_variable_region_range=True)
The visualization module allows to visualize the results of inference. You will be able to plot a signal and each segment to get a sense of the model's performance.
from trnazap.splitter import ResultsVisualizer
# Initialize an instance of 'ResultsVisualizer'
# using an isntance of 'InferenceResults'
with ResultsVisualizer(results2) as res_vis:
# Use the visualize method to visualize the inference results
# naturally, your requested readID should be in the inference results
# Note that you can either pass a single read_id to this method, or a list of read_ids
# a single matplotlib figure of a list of figures will be returned based on the input
fig_ = res_vis.visualize('readID_123')
# OR
res_vis = ResultsVisualizer(results2)
fig_ = res_vis.visualize('readID_123')
fig_.close()