Installation

tRNA ionic current model and alignment tools

Installation

To install the repository, refer to the following instructions:

git clone https://github.com/genometechlab/tRNA_zap.git
cd tRNA_zap
pip install -e .

Splitter Module

The Splitter Module enables classification and segmentation of nanopore ionic current signals into biologically relevant regions.

To use the module, 1) download a model and its configuration from the available models section and 2) refer to the example to learn how to use the module

Available models

Model Name	Config File	Model Weights	Description
`zap_s54_c127`	`zap_s54_c127.yaml`	`zap_s54_c127.pth`	Standard classifier trained on yeast and ecoli
`zap_s54_c49_IVTecoli`	`zap_s54_c49_IVTecoli.yaml`	`zap_s54_c49_IVTecoli.pth`	Standard classifier trained on ecoli only

Please download one of the specified models and its config file from the table above and place them in the following structure:

<your_project_root>/
├── configs/
│   └── <model_config>.yaml
├── checkpoints/
│   └── <model_weights>.pt
├── your_script.py
└── ...

💡 Tip: The path to the model weights file is specified inside the YAML configuration (checkpoint_path). If you want to move the weights to a different directory, be sure to update that path in the config file accordingly.

Inference

The inference module will ran the model on each of the read_ids
It will classify the read, as part of the classification task,
and will segment the read into variable region, ONT adapter, 3' and 5' splint regions

Inference run example

# From the splitter module, import Inference and ResultsVisualizer classes
from trnazap.splitter import Inference, ResultsVisualizer

# Specify your pod5 paths. This can be a single file or a list of directories
pod5_pth = ['Path/To/pod5/file', 'Path/to/pod5/dir1', 'Path/to/pod5/dir2', ...]

# Specify the reads you want to run inference on
desired_reads = [...]  # A list of read IDs as strings, 

# Load inference engine from a configuration file
config_pth = "/patch/to/config.yaml"

# Device used for inference. For fast inference, use a cuda-enabled GPU. 
#Can use cpu for a small number of read IDs
device = "cuda" 

infer_engine = Inference(config_pth, device=device)

# Run inference
results = infer_engine.predict(
    pod5_paths=pod5_pth,
    read_ids=desired_reads, # if not provided, will perform inference on every read ID in pod5s
    batch_size=2048, # Number of read IDs to be processed in one batch
)

You will get an InferenceResults object as the return value of Inference.predict(...) An explanation on how to use and interact with this isntance is provided below

`InferenceResults`

The InferenceResults object is a lightweight container that stores all outputs from an inference run, indexed by read ID. It also includes relevant metadata and supports basic persistence and inspection.

results = infer_engine.predict(...)

results[read_id] -> ReadResult

Returns the inference result for a specific read ID. Raises KeyError if not found.
```
read_result = results["read_abc123"]
```
To Check if a read is present
```
if "read_abc123" in results:
    ...
```
results.read_ids -> List[str]

Returns a list of all read IDs in the result set.
```
all_ids = results.read_ids
```
results.save(path: str | Path) -> None

Saves the full results object to a .pkl file.
```
results.save("/path/to/save/results.pkl")
```
InferenceResults.load(path: str | Path) -> InferenceResults (class method)

Loads a previously saved results object from disk.
```
results = InferenceResults.load("/path/to/saved/results.pkl")
```
results.summary() -> Dict[str, Any]

Returns a dictionary with summary statistics about the inference run:
- number of reads
- total chunks
- chunk size
- model type
- device
- inference timestamp
- total inference time
```
summary = results.summary()
```
results.label_names -> Dict

Returns a mappind of label indices to class names
```
summary = results.label_names
```

ReadResult

Each value corresponding to a read_id key in InferenceResults is a ReadResult object. It stores the model outputs for a single read. Probabilities and predictions for both sequence-level and read-level tasks can be accessed directly from this object.

You do not need to create this class manually — it is returned when you access a read ID from InferenceResults:

read_result = inference_results["read_id"]

read_result.seq2seq_preds -> Optional[np.ndarray]

Predicted class indices for each chunk in the read (from the seq2seq task).
```
    chunk_predictions = read_result.seq2seq_preds
```
read_result.classification_pred -> Optional[int]

Predicted class index for the whole read.

💡 Tip: If you would prefer to get the exact class names instead of the class label index, use the label_names from the InferenceResults instance
```
    label_index = read_result.classification_pred
    cls_name = results.label_names[label] # Return the class name
```
read_result.seq2seq_probs -> Optional[np.ndarray]

Softmax probabilities for each class at each chunk position (seq2seq task).
```
    probs = read_result.seq2seq_probs
```
read_result.classification_probs -> Optional[np.ndarray]

Softmax probabilities for the read-level classification task.
```
    probs = read_result.classification_probs
```

read_result.preds -> Dict[str, Any]

Dictionary containing predictions for both tasks.

    {
        "seq_class": read_result.classification_pred,
        "seq2seq": read_result.seq2seq_preds
    }

read_result.probs -> Dict[str, np.ndarray]

Dictionary containing probability outputs for both tasks.

    {
        "seq_class": read_result.classification_probs,
        "seq2seq": read_result.seq2seq_probs
    }

read_result.variable_region_range -> Tuple[int, int]

Start and end indices (inclusive) of the predicted variable region in the signal. Returns (-1, -1) if no region is found.
```
    start, end = read_result.variable_region_range
```
read_result.get_smoothed_seq2seq_preds(...) -> Optional[np.ndarray]

Returns smoothed seq2seq predictions using CRF-based smoothing (if available).

Parameters:
- device (default: 'cpu'): Device to run the CRF on
- return_variable_region_range (default: False): Whether to also return (start, end) range
```
    smoothed_preds = read_result.get_smoothed_seq2seq_preds()

    # or with variable region range:
    smoothed_preds, (start, end) = read_result.get_smoothed_seq2seq_preds(return_variable_region_range=True)
```

Visualization

The visualization module allows to visualize the results of inference. You will be able to plot a signal and each segment to get a sense of the model's performance.

Visualization run example

from trnazap.splitter import ResultsVisualizer


# Initialize an instance of 'ResultsVisualizer'
# using an isntance of 'InferenceResults'

with ResultsVisualizer(results2) as res_vis:
    # Use the visualize method to visualize the inference results
    # naturally, your requested readID should be in the inference results
    # Note that you can either pass a single read_id to this method, or a list of read_ids
    # a single matplotlib figure of a list of figures will be returned based on the input
    fig_ = res_vis.visualize('readID_123')

# OR

res_vis = ResultsVisualizer(results2)
fig_ = res_vis.visualize('readID_123')
fig_.close()

Name		Name	Last commit message	Last commit date
Latest commit History 285 Commits
checkpoints		checkpoints
configs		configs
images		images
src/trnazap		src/trnazap
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
MITLicense		MITLicense
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Installation

Splitter Module

Available models

Inference

Inference run example

`InferenceResults`

`results[read_id] -> ReadResult`

`results.read_ids -> List[str]`

`results.save(path: str | Path) -> None`

`InferenceResults.load(path: str | Path) -> InferenceResults` (class method)

`results.summary() -> Dict[str, Any]`

`results.label_names -> Dict`

ReadResult

read_result.seq2seq_preds -> Optional[np.ndarray]

read_result.classification_pred -> Optional[int]

read_result.seq2seq_probs -> Optional[np.ndarray]

read_result.classification_probs -> Optional[np.ndarray]

read_result.preds -> Dict[str, Any]

read_result.probs -> Dict[str, np.ndarray]

read_result.variable_region_range -> Tuple[int, int]

read_result.get_smoothed_seq2seq_preds(...) -> Optional[np.ndarray]

Visualization

Visualization run example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Installation

Splitter Module

Available models

Inference

Inference run example

InferenceResults

results[read_id] -> ReadResult

results.read_ids -> List[str]

results.save(path: str | Path) -> None

InferenceResults.load(path: str | Path) -> InferenceResults (class method)

results.summary() -> Dict[str, Any]

results.label_names -> Dict

ReadResult

read_result.seq2seq_preds -> Optional[np.ndarray]

read_result.classification_pred -> Optional[int]

read_result.seq2seq_probs -> Optional[np.ndarray]

read_result.classification_probs -> Optional[np.ndarray]

read_result.preds -> Dict[str, Any]

read_result.probs -> Dict[str, np.ndarray]

read_result.variable_region_range -> Tuple[int, int]

read_result.get_smoothed_seq2seq_preds(...) -> Optional[np.ndarray]

Visualization

Visualization run example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`InferenceResults`

`results[read_id] -> ReadResult`

`results.read_ids -> List[str]`

`results.save(path: str | Path) -> None`

`InferenceResults.load(path: str | Path) -> InferenceResults` (class method)

`results.summary() -> Dict[str, Any]`

`results.label_names -> Dict`

Packages