Skip to content

Fix Eval trace structure inconsistencies#119

Merged
David Elner (delner) merged 1 commit intomainfrom
fix/eval_trace_structure
Mar 17, 2026
Merged

Fix Eval trace structure inconsistencies#119
David Elner (delner) merged 1 commit intomainfrom
fix/eval_trace_structure

Conversation

@delner
Copy link
Collaborator

There are numerous issues with the Eval trace structure when compared to the Java SDK (the other OTel-based implementation) Python, and TypeScript traces.

Changes

Issue Before After
Single shared score span All scorers ran inside one "score" span with all scores aggregated on it Each scorer gets its own "score" span as a direct child of the eval span, matching Java/Python/TS
Missing purpose: "scorer" on score spans span_attributes only had {type: "score"} span_attributes includes {type: "score", name: scorer_name, purpose: "scorer"} — used by the platform to filter scorer spans from cost/latency calculations
Missing scorer input/output on score spans Score span had no input_json or output_json Each score span logs input_json (input, expected, output, metadata) and output_json (scores hash), matching Python/TS expected output
Eval span input_json not wrapped Raw value (e.g., "hello") Wrapped as {input: "hello"}, matching Java SDK
Eval span output_json not wrapped Raw value (e.g., "HELLO") Wrapped as {output: "HELLO"}, matching Java SDK
Missing metadata on eval span Case metadata not logged on the eval span Case metadata set as braintrust.metadata on the eval span, matching Java SDK
Missing output_json on eval span when task errors No output_json attribute set at all Sets {output: null}, matching Java SDK
Eval span attributes missing on task error span_attributes, input_json, expected, metadata, origin were set after task+scorers, so they were skipped on task error All known attributes set before task execution so they're present regardless of task outcome
Eval spans not isolated from ambient trace context Used tracer.in_span("eval") which inherits any active parent span (e.g., a Sidekiq job span) Uses tracer.start_root_span("eval") so each eval case starts its own independent trace, matching Java's setNoParent()

@delner David Elner (delner) merged commit 2d9c2a7 into main Mar 17, 2026
7 checks passed
@delner David Elner (delner) deleted the fix/eval_trace_structure branch March 17, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants