Selectors

A ModelSelector picks the best model combination for an agent against a dataset: instantiate the agent with each candidate dict, evaluate, rank by eval_fn. All selectors share one constructor surface and one entry point — select_best() — differing only in their search algorithm.

from agentopt import ModelSelector

selector = ModelSelector(
    agent=MyAgent,
    models={"planner": ["gpt-4o", "gpt-4o-mini"], "solver": ["gpt-4o-mini"]},
    eval_fn=lambda expected, actual: float(actual == expected),
    dataset=[(inp, expected), ...],
    method="auto",                # arm_elimination — strong + cheap
)
results = selector.select_best(parallel=True, max_concurrent=20)
results.print_summary()

Common parameters

Parameter	Type	Description
`agent`	`type`	Agent class with `__init__(self, models)` and `run(self, input_data)`. Duck-typed — no base class required.
`models`	`Dict[str, List]`	Maps node names to candidate model lists (e.g. `{"planner": ["gpt-4o", "gpt-4o-mini"]}`).
`eval_fn`	`Callable`	`(expected, actual) -> float` score (higher is better).
`dataset`	`Sequence[Tuple]`	`[(input_data, expected_answer), ...]`.
`model_prices`	`Dict`, optional	Custom pricing overrides: `{"model": {"input_price": x, "output_price": y}}` in $/MTok. Required for cost terms when `lambda_cost > 0`.
`lambda_cost`	`float`, optional	Weight on normalized per-sample cost in the combined objective. Default `0.0` (disabled). See Combined objective below.
`lambda_latency`	`float`, optional	Weight on normalized per-sample latency in the combined objective. Default `0.0` (disabled).
`node_descriptions`	`Dict[str, str]`, optional	Human-readable descriptions per node — surfaced in `LMProposalModelSelector`.
`tracker`	`LLMTracker`, optional	Bring your own. Defaults to a fresh `LLMTracker()` started in the constructor. Pass one in to share a cache across runs, route via a daemon (`AGENTOPT_GATEWAY_URL`), or post-process records after `select_best()` returns.

The selector calls tracker.start() in the constructor and tracker.stop() when select_best() returns or raises. Record queries on the tracker remain valid after stop(), so post-run analysis works:

tracker = LLMTracker(cache_dir="./shared_cache")
selector = ModelSelector(..., tracker=tracker)
selector.select_best()
print(tracker.get_usage())          # tracker.stop() already called; records still here

See tracker.md for the full tracker surface.

Combined objective (optional cost/latency weights)

By default, selectors optimize eval_fn score only (typically accuracy) and break ties with latency, then price. To trade accuracy against cost and latency in one scalar reward, pass optional weights on the constructor (or via ModelSelector(..., **kwargs)):

Parameter	Default	Effect
`lambda_cost`	`0.0`	Penalizes normalized per-sample token cost (USD from the tracker, or `model_prices`).
`lambda_latency`	`0.0`	Penalizes normalized per-sample wall-clock latency (seconds).

Omit both parameters (or leave them at 0.0) for the original accuracy-centric behavior. Set one or both when you want multi-metric selection.

Formula

For each datapoint, after observations are recorded:

combined = score
         - lambda_cost    * norm(cost)
         - lambda_latency * norm(latency)

score — return value of eval_fn (higher is better).
norm(·) — min–max scale to [0, 1] using running min/max over all samples seen during that selector run (updated as more combos are evaluated).
Per combination — mean of per-datapoint combined values → ModelResult.combined_objective (see results.md).

This is a linear scalarization, not Pareto exploration. Larger lambda_* penalize cost/latency more strongly relative to score.

Example

selector = ModelSelector(
    agent=MyAgent,
    models=models,
    eval_fn=eval_fn,
    dataset=dataset,
    method="matrix_ucb",
    lambda_cost=0.3,      # optional — omit for accuracy-only
    lambda_latency=0.2,
    model_prices={        # recommended when lambda_cost > 0
        "gpt-4o": {"input_price": 2.5, "output_price": 10.0},
        "gpt-4o-mini": {"input_price": 0.15, "output_price": 0.6},
    },
)
results = selector.select_best(parallel=True)
results.print_summary()   # ranks by combined_objective when lambdas are set

How each method uses the weights

Methods	During search	Final `is_best`
`matrix_ucb`, `matrix_ucb_lrf`	UCB rewards use per-cell combined objective	`_find_best` on `combined_objective`
`arm_elimination`, `epsilon_lucb`, `threshold`	Elimination / LUCB stats on combined per-sample objectives	same
`hill_climbing`, `bayesian`	Move / surrogate target uses combined objective	same
`brute_force`, `random`	Does not steer which combos to try	same
`lm_proposal`	Proposer uses `objective=` text, not these lambdas	`combined_objective` on the one evaluated combo only

After select_best(), a final pass recomputes every result’s combined_objective against the full-run normalizer so rankings are comparable.

!!! note "lm_proposal vs lambdas" LMProposalModelSelector(objective="...") is a natural-language hint to the proposer LLM. It is separate from lambda_cost / lambda_latency, which only affect the scalar reward used for ranking and bandit methods.

`select_best()`

results = selector.select_best(
    parallel=False,        # If True, evaluate combos concurrently with asyncio
    max_concurrent=20,     # Total concurrent API-call budget across all combos
)

Returns a SelectionResults. parallel=True requires agent.run to be either async or threadsafe; the selector splits max_concurrent between outer (combos) and inner (datapoints) loops based on dataset size.

!!! note "Automatic cleanup" select_best() calls tracker.stop() on return or exception — caches flush to disk, masters tear down. The tracker remains queryable; only close() (which select_best does not call) drops the remote backend's HTTP client.

Choosing a method

`method`	Algorithm	When to use
`"auto"` (default)	Arm elimination	Strong best-arm identification at lower search cost than brute force. Same impl as `"arm_elimination"`.
`"brute_force"`	Evaluate every combo on the full dataset	Small search space; ground-truth comparison.
`"random"`	Random search	Cheap baseline.
`"hill_climbing"`	Greedy per-node	Large combinatorial spaces with weak coupling between nodes.
`"arm_elimination"`	Successive elimination	Best-arm identification with PAC-style guarantees.
`"epsilon_lucb"`	LUCB with tolerance	Stop once a combo is within ε of the best.
`"matrix_ucb"` / `"matrix_ucb_lrf"`	UCB exploiting cross-combo structure	Large model x datapoint matrices; `lrf` adds low-rank factorization.
`"threshold"`	Threshold bandit successive elimination	"Find all combos above accuracy θ" rather than the single best.
`"lm_proposal"`	LM-guided	Uses `node_descriptions` to propose combinations.
`"bayesian"`	Bayesian optimization	Optional extra: `pip install "agentopt-py[bayesian]"`.

Selector Classes

::: agentopt.model_selection.brute_force.BruteForceModelSelector options: members: false show_bases: false

::: agentopt.model_selection.random_search.RandomSearchModelSelector options: members: false show_bases: false

::: agentopt.model_selection.hill_climbing.HillClimbingModelSelector options: members: false show_bases: false

::: agentopt.model_selection.arm_elimination.ArmEliminationModelSelector options: members: false show_bases: false

::: agentopt.model_selection.epsilon_lucb.EpsilonLUCBModelSelector options: members: false show_bases: false

::: agentopt.model_selection.matrix_ucb.MatrixUCBModelSelector options: members: false show_bases: false

::: agentopt.model_selection.matrix_ucb.MatrixUCBLRFModelSelector options: members: false show_bases: false

::: agentopt.model_selection.threshold_successive_elimination.ThresholdBanditSEModelSelector options: members: false show_bases: false

::: agentopt.model_selection.lm_proposal.LMProposalModelSelector options: members: false show_bases: false

::: agentopt.model_selection.bayesian_optimization.BayesianOptimizationModelSelector options: members: false show_bases: false

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selectors

Common parameters

Combined objective (optional cost/latency weights)

Formula

Example

How each method uses the weights

`select_best()`

Choosing a method

Selector Classes

FilesExpand file tree

selectors.md

Latest commit

History

selectors.md

File metadata and controls

Selectors

Common parameters

Combined objective (optional cost/latency weights)

Formula

Example

How each method uses the weights

select_best()

Choosing a method

Selector Classes

`select_best()`