This document describes how to reproduce the evaluation artifacts shipped with this repository. All commands assume you are in the repo root.
python -m venv .venv_linux
./.venv_linux/bin/python -m pip install -U pip
./.venv_linux/bin/python -m pip install -e .Optional (JupyterLab UI):
cd labextension
npm install
npm run stageArtifacts live under evaluation/o1_coverage/ and are regenerated with:
./.venv_linux/bin/python evaluation/o1_coverage/run_coverage.pyThis regenerates:
evaluation/o1_coverage/gold_predictions/predicted_*.jsonevaluation/o1_coverage/coverage_results.jsonevaluation/o1_coverage/coverage_results.mdevaluation/o1_coverage/coverage_results_all.jsonevaluation/o1_coverage/coverage_results_all.md
Gold templates used as the reference are stored in
evaluation/o1_coverage/gold_templates/ and point to the notebooks listed in
evaluation/README.md.
The study is manual and uses the protocol plus sanitized notebooks in:
- Protocol:
evaluation/o2_user_study/user_study_protocol.md - Notebooks:
evaluation/o2_user_study/user_study/*.ipynb - Results sheet (filled manually during the study):
evaluation/o2_user_study/user_study_results.xlsx
Run the study following the protocol and append new rows in the spreadsheet.
Synthetic notebooks used for latency are in evaluation/o3_benchmarks/notebooks/.
Run the benchmarks with:
./.venv_linux/bin/python evaluation/o3_benchmarks/run_latency_benchmarks.pyOutputs:
evaluation/o3_benchmarks/benchmark_results.jsonevaluation/o3_benchmarks/benchmark_results.mdevaluation/o3_benchmarks/benchmark_results_examples.jsonevaluation/o3_benchmarks/benchmark_results_examples.md
Start a local Fuseki instance (example):
java -jar fuseki-server.jar --mem /cellscope./.venv_linux/bin/python evaluation/o3_benchmarks/run_sparql_latency.py \
--out-json evaluation/o3_benchmarks/sparql_latency.json \
--out-md evaluation/o3_benchmarks/sparql_latency.mdLoad existing crates from exports/ into Fuseki:
./.venv_linux/bin/python evaluation/o3_benchmarks/load_exports_to_sparql.py \
--endpoint http://localhost:3030/cellscope/update \
--out evaluation/o3_benchmarks/index_results.jsonThen re-run latency collection:
./.venv_linux/bin/python evaluation/o3_benchmarks/run_sparql_latency.py \
--out-json evaluation/o3_benchmarks/sparql_latency_loaded.json \
--out-md evaluation/o3_benchmarks/sparql_latency_loaded.mdRepresentative RO-Crates (with the offline HTML graph viewer) are stored under
exports/*/ro-crate/.
A quick end-to-end smoke test is available:
./.venv_linux/bin/python scripts/run_full_test.py