Off-Target Prediction Benchmark Reproducibility

This repository accompanies the CRISPR off-target prediction benchmark manuscript. It is intended for readers who want to understand how the benchmark was assembled, rerun the main quantitative comparison from stored tool outputs, and reproduce the manuscript figures.

Here you can find the Quarto documentation. It shows how the benchmark can be run based on precomputed tool predictions, explains the tools used in the benchmark and what to keep in mind when running them, and shows how the results in the manuscript are generated. It also includes pages that show how each manuscript figure is generated based on the benchmark outputs.

Reproducibility Scope

This repository reproduces the benchmark analysis from standardized output tables from each tool, available for download via Zenodo (see below). It does not rerun each external off-target prediction tool from its native software environment. The individual tools have different installation procedures, command-line interfaces, reference genome requirements, and web/API interfaces. The commands and setup notes used to generate the standardized outputs are summarized in the Quarto page docs/06_tool_setup_reference.qmd.

The larger standardized output files with the predictions of each tool and scored candidate sites are provided separately on Zenodo:

Zenodo record: https://zenodo.org/records/20627722
Data archive URL for direct download: https://zenodo.org/records/20627722/files/offtarget_prediction_benchmark_zenodo_artifacts_v1.zip?download=1

The expected layout is explained in config/zenodo_artifacts.yml and summarized in data/zenodo/README.md. Essentially, the Zenodo archive should be extracted into data/zenodo/ and contains the three data subfolders directly. After the archive has been downloaded or symlinked there, the code in this repository allows to rerun the canonical standard benchmark, regenerate the benchmark summary tables, and rebuild the manuscript figures from the benchmark outputs and documented figure-specific inputs.

Repository Contents

The repository contains three main components of the analysis:

the canonical filtered truth table used for the human full-cohort benchmark, which is the basis for all analysis in the manuscript,
the Python code that evaluates standardized tool outputs against that truth table,
and a Quarto documentation site that illustrates the workflows and reproduces the manuscript figures one by one.

Repository Inputs

1. GitHub-tracked compact inputs

These are small enough to be included directly in the repository. The main one is:

data/manuscript/manuscript_primary.csv

This table is the canonical filtered ground truth set used in the standard human benchmark in the manuscript.

2. Zenodo-backed standardized tool outputs

The full benchmark also requires standardized contract files for each off-target prediction tool. These files are called

prediction_contract_<tool>.csv

and should be downloaded from Zenodo and saved or symlinked under:

data/zenodo/standard_tool_predictions/

There are additional Zenodo files for specific figures, which are too large to directly track in the GitHub repo. These include the scored candidate tables for Figure 2 and the no-bulge machine learning (ML) prediction tool contracts used by Figure 5 Panel C. The full expected file list is stored in config/zenodo_artifacts.yml; the standard benchmark contract filenames are also listed in config/tool_output_manifest.example.yml.

Quick start

From the repository root:

pip install -e .
python scripts/run_manuscript_benchmark.py --help

Once the Zenodo files are in place, the canonical rerun is:

python scripts/run_manuscript_benchmark.py

To render the documentation site and execute the figure pages:

python scripts/render_docs.py

This command installs a project-local Jupyter kernel inside .jupyter/ and then runs Quarto with the repository's Python environment.

Reproducing the figures

Figures 3, 4, and 6 are rebuilt from benchmark run outputs.
Figure 6 also uses the no-bulge ML comparison recall-curve output for the machine learning tools.
Figure 5 starts from the standardized prediction contracts (because it measures whether each validated true site was evaluated by each tool, before any rank cutoff is applied.)
Figure 1 describes the benchmark cohort itself and therefore starts from the truth/input layer (GitHub-tracked file in the data folder).
Figure 2 requires a broader scored candidate layer than the benchmark summaries alone and is documented separately (file stored on Zenodo).

Repository structure

src/offtarget_benchmark/: benchmark runner, helper functions, and plotting code
scripts/: command-line scripts for running the benchmark and rendering docs
data/: compact benchmark inputs plus the Zenodo backed data directories
results/benchmark_runs/: benchmark outputs used by the figure pages
docs/: Quarto tutorials and figure walkthroughs

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
config		config
data		data
docs		docs
results/benchmark_runs/no_bulge_ml_comparison		results/benchmark_runs/no_bulge_ml_comparison
scripts		scripts
src/offtarget_benchmark		src/offtarget_benchmark
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Off-Target Prediction Benchmark Reproducibility

Reproducibility Scope

Repository Contents

Repository Inputs

1. GitHub-tracked compact inputs

2. Zenodo-backed standardized tool outputs

Quick start

Reproducing the figures

Repository structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Off-Target Prediction Benchmark Reproducibility

Reproducibility Scope

Repository Contents

Repository Inputs

1. GitHub-tracked compact inputs

2. Zenodo-backed standardized tool outputs

Quick start

Reproducing the figures

Repository structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages