-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Open
Copy link
Labels
enhancementNew feature or requestNew feature or request
Description
Use case
When running an evaluator from the Braintrust Playground, the UI sends a POST to /eval with:
- Evaluator name
- A dataset (e.g.
dataset_idordataset_name) as the data source - Optional parameters from the Playground form (e.g.
internal_model_id,definition, temperature, or other config)
The task should be able to use those parameter values (e.g. to load an internal model record or adjust behavior) while iterating over dataset rows. The Playground only runs against a dataset, so parameters are only useful if the task can read them.
Issue
Request parameter values are never passed into the evaluator task.
- The Eval handler in
lib/braintrust/server/handlers/eval.rbnever readsbody["parameters"]and never passes them intoevaluator.run. Evaluator#runandBraintrust::Eval.runhave noparametersargument.- The Runner in
lib/braintrust/eval/runner.rbonly callstask.call(test_case.input)— a single argument — so the task has no way to receive request-scoped parameters. - Parameter definitions (schema) are already supported: the List handler exposes them for the Playground UI. Only the values sent in the POST body are unused.
So tasks cannot use Playground-supplied parameters when running against a dataset.
Why SDK changes are required
- With dataset_id (or dataset_name), the SDK resolves the data source and builds cases; the application never sees the per-row data.
- The app therefore cannot "merge"
body["parameters"]into each row'sinputbefore the SDK runs. The only way to get request-scoped parameters into the task is for the SDK to read them from the request and pass them into the task (e.g. via a second argument). - The Python SDK already does this: it reads and validates
body["parameters"], passes them into the run, and gives the task ahooksobject withhooks.parameters. The Ruby SDK should do the same so that Playground-driven evals can use parameters when running against a dataset.
Proposed changes (high level)
- Eval handler — Read
body["parameters"], optionally validate/apply defaults using the evaluator's parameter schema, and pass the result intoevaluator.run(e.g.run_opts[:parameters]). - Evaluator#run and Braintrust::Eval.run — Add a
parameters:keyword and pass it through to the Runner. - Runner — Accept
parameters:ininitialize. When invoking the task, if the task accepts two arguments (e.g.task.arity == 2), pass a second argument: a small hooks-like object that exposes the request parameters (e.g.hooks.parameters). Otherwise keep callingtask.call(input)for backward compatibility. - New type — Introduce a minimal hooks object (e.g.
Braintrust::Eval::Hooks) with a#parametersmethod returning the request parameter Hash, and pass one instance per run as the second argument to the task when it accepts two args.
After these changes, a task could use parameters the same way as in Python:
task: ->(input, hooks) {
config = hooks.parameters
# use config["internal_model_id"], config["temperature"], etc.
}
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request