You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-1Lines changed: 28 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -259,6 +259,8 @@ Braintrust::Eval.run(
259
259
)
260
260
```
261
261
262
+
See [eval.rb](./examples/eval.rb) for a full example.
263
+
262
264
### Datasets
263
265
264
266
Use test cases from a Braintrust dataset:
@@ -287,6 +289,8 @@ Braintrust::Eval.run(
287
289
)
288
290
```
289
291
292
+
See [dataset.rb](./examples/eval/dataset.rb) for a full example.
293
+
290
294
### Scorers
291
295
292
296
Use scoring functions defined in Braintrust:
@@ -315,6 +319,8 @@ Braintrust::Eval.run(
315
319
)
316
320
```
317
321
322
+
See [remote_functions.rb](./examples/eval/remote_functions.rb) for a full example.
323
+
318
324
#### Scorer metadata
319
325
320
326
Scorers can return a Hash with `:score` and `:metadata` to attach structured context to the score. The metadata is logged on the scorer's span and visible in the Braintrust UI for debugging and filtering:
@@ -332,6 +338,27 @@ end
332
338
333
339
See [scorer_metadata.rb](./examples/eval/scorer_metadata.rb) for a full example.
334
340
341
+
#### Multiple scores from one scorer
342
+
343
+
When several scores can be computed together (e.g. in one LLM call), you can return an `Array` of score `Hash` instead of a single value. Each metric appears as a separate score column in the Braintrust UI:
344
+
345
+
```ruby
346
+
Braintrust::Scorer.new("summary_quality") do |output:, expected:|
`name` and `score` are required, `metadata` is optional.
359
+
360
+
See [multi_score.rb](./examples/eval/multi_score.rb) for a full example.
361
+
335
362
#### Trace scoring
336
363
337
364
Scorers can access the full evaluation trace (all spans generated by the task) by declaring a `trace:` keyword parameter. This is useful for inspecting intermediate LLM calls, validating tool usage, or checking the message thread:
@@ -361,7 +388,7 @@ Braintrust::Eval.run(
361
388
)
362
389
```
363
390
364
-
See examples: [eval.rb](./examples/eval.rb), [dataset.rb](./examples/eval/dataset.rb), [remote_functions.rb](./examples/eval/remote_functions.rb), [trace_scoring.rb](./examples/eval/trace_scoring.rb)
391
+
See [trace_scoring.rb](./examples/eval/trace_scoring.rb) for a full example.
0 commit comments