Support Playground request parameters in Remote Evals (parity with Python SDK)

## Use case

When running an evaluator from the Braintrust Playground, the UI sends a POST to `/eval` with:
- Evaluator name
- A **dataset** (e.g. `dataset_id` or `dataset_name`) as the data source
- Optional **parameters** from the Playground form (e.g. `internal_model_id`, `definition`, temperature, or other config)

The task should be able to use those **parameter values** (e.g. to load an internal model record or adjust behavior) while iterating over dataset rows. The Playground only runs against a dataset, so parameters are only useful if the task can read them.

---

## Issue

**Request parameter values are never passed into the evaluator task.**

- The Eval handler in `lib/braintrust/server/handlers/eval.rb` never reads `body["parameters"]` and never passes them into `evaluator.run`.
- `Evaluator#run` and `Braintrust::Eval.run` have no `parameters` argument.
- The Runner in `lib/braintrust/eval/runner.rb` only calls `task.call(test_case.input)` — a single argument — so the task has no way to receive request-scoped parameters.
- Parameter **definitions** (schema) are already supported: the List handler exposes them for the Playground UI. Only the **values** sent in the POST body are unused.

So tasks cannot use Playground-supplied parameters when running against a dataset.

---

## Why SDK changes are required

- With **dataset_id** (or **dataset_name**), the SDK resolves the data source and builds cases; the application never sees the per-row data.
- The app therefore cannot "merge" `body["parameters"]` into each row's `input` before the SDK runs. The only way to get request-scoped parameters into the task is for the **SDK** to read them from the request and pass them into the task (e.g. via a second argument).
- The **Python SDK** already does this: it reads and validates `body["parameters"]`, passes them into the run, and gives the task a `hooks` object with `hooks.parameters`. The Ruby SDK should do the same so that Playground-driven evals can use parameters when running against a dataset.

---

## Proposed changes (high level)

1. **Eval handler** — Read `body["parameters"]`, optionally validate/apply defaults using the evaluator's parameter schema, and pass the result into `evaluator.run` (e.g. `run_opts[:parameters]`).
2. **Evaluator#run and Braintrust::Eval.run** — Add a `parameters:` keyword and pass it through to the Runner.
3. **Runner** — Accept `parameters:` in `initialize`. When invoking the task, if the task accepts two arguments (e.g. `task.arity == 2`), pass a second argument: a small hooks-like object that exposes the request parameters (e.g. `hooks.parameters`). Otherwise keep calling `task.call(input)` for backward compatibility.
4. **New type** — Introduce a minimal hooks object (e.g. `Braintrust::Eval::Hooks`) with a `#parameters` method returning the request parameter Hash, and pass one instance per run as the second argument to the task when it accepts two args.

After these changes, a task could use parameters the same way as in Python:
```
task: ->(input, hooks) {
  config = hooks.parameters
  # use config["internal_model_id"], config["temperature"], etc.
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Playground request parameters in Remote Evals (parity with Python SDK) #122

Use case

Issue

Why SDK changes are required

Proposed changes (high level)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Playground request parameters in Remote Evals (parity with Python SDK) #122

Description

Use case

Issue

Why SDK changes are required

Proposed changes (high level)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions