Skip to content

Bake provenance JSON into images and auto-populate post_results.py args#338

Open
misiugodfrey wants to merge 2 commits into
mainfrom
misiug/benchmarkProvenance
Open

Bake provenance JSON into images and auto-populate post_results.py args#338
misiugodfrey wants to merge 2 commits into
mainfrom
misiug/benchmarkProvenance

Conversation

@misiugodfrey
Copy link
Copy Markdown
Contributor

Summary

  • Writes /opt/velox-testing/provenance.json into each Presto image at build time (both native_build.dockerfile and provenance_labels.dockerfile), so run_context.py can read it from the container filesystem in both Docker and SLURM/Enroot environments (Enroot .sqsh images cannot expose OCI metadata like labels, but the baked-in file is accessible via the normal filesystem)
  • run_context.py reads the provenance file and merges presto_sha/branch/repo and velox_sha/branch/repo into the benchmark context dict written to benchmark_result.json
  • post_results.py auto-populates --velox-branch, --velox-repo, --presto-branch, --presto-repo from the benchmark_result.json context; CLI args still take precedence; older result files without provenance fields are handled gracefully
  • presto-build.yml: replaces the labels: input on the deps and coordinator build steps with a second provenance_labels.dockerfile wrapper step — buildx labels: applies OCI metadata only and does not execute RUN steps, so the provenance file would not have been written via that path

Test plan

  • Build a new native worker image: presto/scripts/start_native_cpu_presto.sh
  • Confirm provenance file exists: docker run --rm presto-native-worker-cpu:$USER cat /opt/velox-testing/provenance.json
  • Run a short benchmark: presto/scripts/run_benchmark.sh -b tpch -s bench_sf1
  • Confirm benchmark_result.json context has velox_branch, velox_repo, presto_branch, presto_repo populated
  • Run post_results.py --dry-run without branch/repo args; confirm payload engine_config shows values auto-populated from the result file
  • Run with --velox-branch override; confirm CLI value wins over the label
  • Test with an older result file (no provenance in context); confirm the fields are absent from engine_config without error
  • Trigger a CI build to produce images via the updated presto-build.yml; pull the resulting image into a SLURM/Enroot context and run a quick benchmark to confirm provenance fields are populated via the filesystem path

Writes /opt/velox-testing/provenance.json into each Presto image at build
time (both native_build.dockerfile and provenance_labels.dockerfile), so
run_context.py can read it from the container filesystem in both Docker and
SLURM/Enroot environments.

run_context.py: reads the provenance file and merges presto_sha/branch/repo
and velox_sha/branch/repo into the benchmark context dict.

post_results.py: auto-populates --velox-branch, --velox-repo, --presto-branch,
--presto-repo from the benchmark_result.json context section; CLI args still
take precedence. BenchmarkMetadata gains 6 optional provenance fields.

presto-build.yml: replaces the labels: input on the deps and coordinator build
steps with a second provenance_labels.dockerfile wrapper step, so CI and local
builds both write the provenance file (buildx labels: applies OCI metadata only
and does not execute RUN steps).
@misiugodfrey misiugodfrey requested a review from a team as a code owner May 7, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant