Skip to content

Support Playground request parameters in Remote Evals (parity with Python SDK) #122

@gabe-kent

Description

Use case

When running an evaluator from the Braintrust Playground, the UI sends a POST to /eval with:

  • Evaluator name
  • A dataset (e.g. dataset_id or dataset_name) as the data source
  • Optional parameters from the Playground form (e.g. internal_model_id, definition, temperature, or other config)

The task should be able to use those parameter values (e.g. to load an internal model record or adjust behavior) while iterating over dataset rows. The Playground only runs against a dataset, so parameters are only useful if the task can read them.


Issue

Request parameter values are never passed into the evaluator task.

  • The Eval handler in lib/braintrust/server/handlers/eval.rb never reads body["parameters"] and never passes them into evaluator.run.
  • Evaluator#run and Braintrust::Eval.run have no parameters argument.
  • The Runner in lib/braintrust/eval/runner.rb only calls task.call(test_case.input) — a single argument — so the task has no way to receive request-scoped parameters.
  • Parameter definitions (schema) are already supported: the List handler exposes them for the Playground UI. Only the values sent in the POST body are unused.

So tasks cannot use Playground-supplied parameters when running against a dataset.


Why SDK changes are required

  • With dataset_id (or dataset_name), the SDK resolves the data source and builds cases; the application never sees the per-row data.
  • The app therefore cannot "merge" body["parameters"] into each row's input before the SDK runs. The only way to get request-scoped parameters into the task is for the SDK to read them from the request and pass them into the task (e.g. via a second argument).
  • The Python SDK already does this: it reads and validates body["parameters"], passes them into the run, and gives the task a hooks object with hooks.parameters. The Ruby SDK should do the same so that Playground-driven evals can use parameters when running against a dataset.

Proposed changes (high level)

  1. Eval handler — Read body["parameters"], optionally validate/apply defaults using the evaluator's parameter schema, and pass the result into evaluator.run (e.g. run_opts[:parameters]).
  2. Evaluator#run and Braintrust::Eval.run — Add a parameters: keyword and pass it through to the Runner.
  3. Runner — Accept parameters: in initialize. When invoking the task, if the task accepts two arguments (e.g. task.arity == 2), pass a second argument: a small hooks-like object that exposes the request parameters (e.g. hooks.parameters). Otherwise keep calling task.call(input) for backward compatibility.
  4. New type — Introduce a minimal hooks object (e.g. Braintrust::Eval::Hooks) with a #parameters method returning the request parameter Hash, and pass one instance per run as the second argument to the task when it accepts two args.

After these changes, a task could use parameters the same way as in Python:

task: ->(input, hooks) {
  config = hooks.parameters
  # use config["internal_model_id"], config["temperature"], etc.
}

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions