Skip to content

Process bare metal job results#680

Open
epompeii wants to merge 5 commits intodevelfrom
u/ep/job-results
Open

Process bare metal job results#680
epompeii wants to merge 5 commits intodevelfrom
u/ep/job-results

Conversation

@epompeii
Copy link
Member

This changeset add processing of bare metal job results.
The Report is created initially. Then once the job is completed, results are connected to the Report.

@github-actions
Copy link

github-actions bot commented Feb 26, 2026

🤖 Claude Code Review

PR: #680
Base: devel
Head: u/ep/job-results
Commit: 02352378efab7e76de5c5247b078228b02199c3c


Pull Request Review: Job Processing & Multi-Iteration Support

Overview

This PR (5 commits, 85 files, +4941/-1563) adds:

  • Multi-iteration job execution support (Iteration type, JsonIterationOutput)
  • New Processed job status (between Completed and Failed)
  • Shared WebSocket wire protocol types (RunnerMessage, ServerMessage, CloseReason)
  • Consolidated auth helpers (bearer_header, strip_bearer_token)
  • Comprehensive test expansion across all test suites

High Severity

1. Missing #[cfg(feature = "plus")] — will break cargo check --no-default-features

lib/bencher_schema/src/model/runner/job.rs

Three functions reference plus-only APIs (context.licensor, context.oci_storage(), PublicUser) but lack the feature gate:

  • QueryJob::process_results
  • reprocess_completed_jobs
  • reprocess_single_completed_job

The nearby mark_orphaned_completed_as_failed correctly has #[cfg(feature = "plus")], showing the pattern exists but was inconsistently applied. CLAUDE.md requires cargo check --no-default-features to pass.

2. JobStatus discriminant shift — existing DB rows will be misinterpreted without migration

lib/bencher_json/src/runner/job_status.rs

The integer mappings changed: Failed moved from 4→5, Canceled from 5→6, and Processed=4 was inserted. The migration 2026-02-27-120000_job_processed/up.sql correctly remaps values (UPDATE job SET status = 6 WHERE status = 5; UPDATE job SET status = 5 WHERE status = 4). Verify this migration runs before the new code is deployed — without it, existing Failed (4) rows would be read as Processed.

3. JsonJobOutput backwards compatibility broken

lib/bencher_json/src/runner/job.rs

The old flat format ({"exit_code":0,"stdout":"hello","stderr":"world","output":{...}}) was replaced with {"results":[...]}. The old backwards_compat_completed_json test now tests the new format, not the old one. If any old-format blobs exist in OCI storage, they will fail to deserialize. Either add a #[serde(untagged)] fallback deserializer or verify no old-format data exists in production.


Medium Severity

4. allow_failure with all iterations erroring silently succeeds

plus/bencher_runner/src/up/job.rs:108-119

When allow_failure=true and every iteration fails, the results vector is empty and the job is sent as RunnerMessage::Completed { results: [] }. This produces zero metrics rather than a failure. Consider at minimum documenting this behavior, or treating all-iterations-failed as an error even with allow_failure.

5. handle_running returns Ack for unexpected non-canceled states

plus/api_runners/src/jobs/websocket.rs:376-378

When the Running transition fails and the job is not canceled, the handler logs a warning and returns Ok(None) (which sends Ack). The runner will think the transition succeeded and proceed to execute. Consider returning an error response or ServerMessage::Failed instead.

6. Iteration(0) is allowed but semantically meaningless

lib/bencher_json/src/runner/job.rs, lib/bencher_json/src/project/report.rs

Iteration::from_str("0") succeeds. Callers guard with job.config.iter.map_or(1, ...) but the type itself permits zero. Consider validating iter >= 1 at the type/deserialization layer per CLAUDE.md guidance on strong types.


Low Severity

7. write_conn! used for read-only query

lib/bencher_schema/src/model/runner/job.rs:918reprocess_completed_jobs uses write_conn!(context) for a SELECT query. Should use auth_conn! to avoid unnecessarily acquiring the exclusive write lock.

8. Test doc comment mismatch

plus/api_runners/tests/jobs/websocket.rs:950 — Comment says "Output message with stdout only" but the test is channel_completed_with_stderr_only. Minor copy-paste error.

9. last_exit_code defaults to 0

plus/bencher_runner/src/up/job.rs:78 — If no iterations execute (e.g., immediate cancel), last_exit_code remains 0, suggesting success. Mitigated by the cancel path returning JobOutcome::Canceled.

10. Missing #[typeshare] / JsonSchema on WebSocket types

lib/bencher_json/src/runner/websocket.rsRunnerMessage, ServerMessage, and CloseReason lack #[typeshare::typeshare] and JsonSchema derives. Fine if these are purely server/runner-side, but worth confirming they don't need TypeScript or OpenAPI representation.


Positive Observations

  • Excellent test coverage — The WebSocket test suite is comprehensive: auth, happy path, cancellation, race conditions (concurrent cancel during Running, idempotent duplicates), timeouts, token rotation, and multi-iteration variants.
  • Clean UpdateJob helper refactoringterminate, set_status, heartbeat, start, claim, execute_if_status, execute_if_either_status reduce duplication and encapsulate TOCTOU-safe conditional UPDATEs.
  • Proper wire protocol consolidation — Moving RunnerMessage/ServerMessage/CloseReason into bencher_json eliminates client/server type drift.
  • Security improvementsstrip_bearer_token uses case-insensitive comparison per RFC 7235; BLOCKED_ENV_VARS prevents LD_PRELOAD injection; WebSocket message size is bounded.
  • Good concurrency model — Conditional UPDATEs with status filters prevent TOCTOU races; heartbeat timeout safety nets catch disconnected runners.
  • CLAUDE.md compliancethiserror throughout (no anyhow), #[expect] over #[allow], strong types (JobId, JobUuid, Iteration, DateTime), BTreeMap for deterministic serialization, camino paths, proper destructuring in TryFrom impls, deterministic time in tests via Clock::Custom and tokio::time::pause().
  • Migration quality — Correct reverse-order remapping to avoid collisions; report_head_index well-documented.

Action Items

Priority Item
Must fix Add #[cfg(feature = "plus")] to process_results, reprocess_completed_jobs, reprocess_single_completed_job (#1)
Must verify Migration 2026-02-27-120000_job_processed runs before deploy (#2)
Must verify No old-format JsonJobOutput blobs exist in production OCI storage (#3)
Should fix Handle unexpected state in handle_running more explicitly (#5)
Should fix Fix test doc comment (#8)
Consider Validate Iteration >= 1 at type level (#6)
Consider Document allow_failure + all-fail behavior (#4)
Consider Use auth_conn! for read query (#7)

Model: claude-opus-4-6

@github-actions
Copy link

github-actions bot commented Feb 26, 2026

🐰 Bencher Report

Branchu/ep/job-results
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
microseconds (µs)
(Result Δ%)
Upper Boundary
microseconds (µs)
(Limit %)
Adapter::Json📈 view plot
🚷 view threshold
3.75 µs
(+14.94%)Baseline: 3.26 µs
4.42 µs
(84.92%)
Adapter::Magic (JSON)📈 view plot
🚷 view threshold
3.72 µs
(+13.58%)Baseline: 3.27 µs
4.38 µs
(85.00%)
Adapter::Magic (Rust)📈 view plot
🚷 view threshold
25.75 µs
(+2.14%)Baseline: 25.21 µs
31.28 µs
(82.34%)
Adapter::Rust📈 view plot
🚷 view threshold
2.99 µs
(+6.96%)Baseline: 2.80 µs
3.49 µs
(85.76%)
Adapter::RustBench📈 view plot
🚷 view threshold
2.98 µs
(+6.77%)Baseline: 2.80 µs
3.47 µs
(85.95%)
head_version_insert/batch/10📈 view plot
🚷 view threshold
106.63 µs
(+11.21%)Baseline: 95.88 µs
128.79 µs
(82.79%)
head_version_insert/batch/100📈 view plot
🚷 view threshold
245.60 µs
(+6.02%)Baseline: 231.66 µs
270.06 µs
(90.94%)
head_version_insert/batch/255📈 view plot
🚷 view threshold
469.51 µs
(+2.89%)Baseline: 456.33 µs
500.96 µs
(93.72%)
head_version_insert/batch/50📈 view plot
🚷 view threshold
164.54 µs
(+5.92%)Baseline: 155.35 µs
186.77 µs
(88.10%)
threshold_query/join/10📈 view plot
🚷 view threshold
154.21 µs
(+11.25%)Baseline: 138.62 µs
174.67 µs
(88.28%)
threshold_query/join/20📈 view plot
🚷 view threshold
167.97 µs
(+10.09%)Baseline: 152.57 µs
188.71 µs
(89.01%)
threshold_query/join/5📈 view plot
🚷 view threshold
143.25 µs
(+9.22%)Baseline: 131.16 µs
163.95 µs
(87.38%)
threshold_query/join/50📈 view plot
🚷 view threshold
215.68 µs
(+11.64%)Baseline: 193.19 µs
232.94 µs
(92.59%)
🐰 View full continuous benchmarking report in Bencher

@epompeii epompeii marked this pull request as ready for review February 28, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant