Reproduction Guide

This document describes how to reproduce the evaluation artifacts shipped with this repository. All commands assume you are in the repo root.

0) Environment setup

python -m venv .venv_linux
./.venv_linux/bin/python -m pip install -U pip
./.venv_linux/bin/python -m pip install -e .

Optional (JupyterLab UI):

cd labextension
npm install
npm run stage

O1 Coverage (gold labels vs. predictions)

Artifacts live under evaluation/o1_coverage/ and are regenerated with:

./.venv_linux/bin/python evaluation/o1_coverage/run_coverage.py

This regenerates:

evaluation/o1_coverage/gold_predictions/predicted_*.json
evaluation/o1_coverage/coverage_results.json
evaluation/o1_coverage/coverage_results.md
evaluation/o1_coverage/coverage_results_all.json
evaluation/o1_coverage/coverage_results_all.md

Gold templates used as the reference are stored in evaluation/o1_coverage/gold_templates/ and point to the notebooks listed in evaluation/README.md.

O2 User study (materials + results)

The study is manual and uses the protocol plus sanitized notebooks in:

Protocol: evaluation/o2_user_study/user_study_protocol.md
Notebooks: evaluation/o2_user_study/user_study/*.ipynb
Results sheet (filled manually during the study): evaluation/o2_user_study/user_study_results.xlsx

Run the study following the protocol and append new rows in the spreadsheet.

O3 Latency benchmarks (local)

Synthetic notebooks used for latency are in evaluation/o3_benchmarks/notebooks/. Run the benchmarks with:

./.venv_linux/bin/python evaluation/o3_benchmarks/run_latency_benchmarks.py

Outputs:

evaluation/o3_benchmarks/benchmark_results.json
evaluation/o3_benchmarks/benchmark_results.md
evaluation/o3_benchmarks/benchmark_results_examples.json
evaluation/o3_benchmarks/benchmark_results_examples.md

O3 Latency benchmarks (SPARQL)

Start a local Fuseki instance (example):

java -jar fuseki-server.jar --mem /cellscope

Empty store latency

./.venv_linux/bin/python evaluation/o3_benchmarks/run_sparql_latency.py \
  --out-json evaluation/o3_benchmarks/sparql_latency.json \
  --out-md evaluation/o3_benchmarks/sparql_latency.md

Loaded store latency

Load existing crates from exports/ into Fuseki:

./.venv_linux/bin/python evaluation/o3_benchmarks/load_exports_to_sparql.py \
  --endpoint http://localhost:3030/cellscope/update \
  --out evaluation/o3_benchmarks/index_results.json

Then re-run latency collection:

./.venv_linux/bin/python evaluation/o3_benchmarks/run_sparql_latency.py \
  --out-json evaluation/o3_benchmarks/sparql_latency_loaded.json \
  --out-md evaluation/o3_benchmarks/sparql_latency_loaded.md

Representative exports

Representative RO-Crates (with the offline HTML graph viewer) are stored under exports/*/ro-crate/.

Sanity check (optional)

A quick end-to-end smoke test is available:

./.venv_linux/bin/python scripts/run_full_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduction Guide

0) Environment setup

O1 Coverage (gold labels vs. predictions)

O2 User study (materials + results)

O3 Latency benchmarks (local)

O3 Latency benchmarks (SPARQL)

Empty store latency

Loaded store latency

Representative exports

Sanity check (optional)

FilesExpand file tree

REPRODUCE.md

Latest commit

History

REPRODUCE.md

File metadata and controls

Reproduction Guide

0) Environment setup

O1 Coverage (gold labels vs. predictions)

O2 User study (materials + results)

O3 Latency benchmarks (local)

O3 Latency benchmarks (SPARQL)

Empty store latency

Loaded store latency

Representative exports

Sanity check (optional)