AgentOptimizer · QianJaneXie · May 28, 2026 · May 28, 2026
diff --git a/docs/api/selectors.md b/docs/api/selectors.md
@@ -11,6 +11,8 @@ selector = ModelSelector(
     eval_fn=lambda expected, actual: float(actual == expected),
     dataset=[(inp, expected), ...],
     method="auto",                # arm_elimination — strong + cheap
+    objective_mode="weighted",
+    lambda_latency=0.2,
 )
 results = selector.select_best(parallel=True, max_concurrent=20)
 results.print_summary()
@@ -24,9 +26,10 @@ results.print_summary()
 | `models` | `Dict[str, List]` | Maps node names to candidate model lists (e.g. `{"planner": ["gpt-4o", "gpt-4o-mini"]}`). |
 | `eval_fn` | `Callable` | `(expected, actual) -> float` score (higher is better). |
 | `dataset` | `Sequence[Tuple]` | `[(input_data, expected_answer), ...]`. |
+| `objective_mode` | `str`, **required** | `"weighted"` — one recommended combo via `lambda_cost` / `lambda_latency`. `"pareto"` — empirical frontier (error, latency, cost); matrix UCB uses Chebyshev exploration internally. |
 | `model_prices` | `Dict`, optional | Custom pricing overrides: `{"model": {"input_price": x, "output_price": y}}` in $/MTok. Required for cost terms when `lambda_cost > 0`. |
-| `lambda_cost` | `float`, optional | Weight on **normalized** per-sample cost in the combined objective. Default `0.0` (disabled). See [Combined objective](#combined-objective-optional-costlatency-weights) below. |
-| `lambda_latency` | `float`, optional | Weight on **normalized** per-sample latency in the combined objective. Default `0.0` (disabled). |
+| `lambda_cost` | `float` | Weight on **normalized** per-sample cost (**weighted** mode only). |
+| `lambda_latency` | `float` | Weight on **normalized** per-sample latency (**weighted** mode only). |
 | `node_descriptions` | `Dict[str, str]`, optional | Human-readable descriptions per node — surfaced in `LMProposalModelSelector`. |
 | `tracker` | `LLMTracker`, optional | Bring your own. Defaults to a fresh `LLMTracker()` started in the constructor. Pass one in to share a cache across runs, route via a daemon (`AGENTOPT_GATEWAY_URL`), or post-process records after `select_best()` returns. |
 
@@ -41,67 +44,55 @@ print(tracker.get_usage())          # tracker.stop() already called; records sti
 
 See [tracker.md](tracker.md) for the full tracker surface.
 
-## Combined objective (optional cost/latency weights)
-
-By default, selectors optimize **`eval_fn` score only** (typically accuracy) and break ties with latency, then price. To trade accuracy against cost and latency in one scalar reward, pass optional weights on the constructor (or via `ModelSelector(..., **kwargs)`):
-
-| Parameter | Default | Effect |
-|:---|:---|:---|
-| `lambda_cost` | `0.0` | Penalizes normalized per-sample **token cost** (USD from the tracker, or `model_prices`). |
-| `lambda_latency` | `0.0` | Penalizes normalized per-sample **wall-clock latency** (seconds). |
+## Objective mode (required)
 
-Omit both parameters (or leave them at `0.0`) for the original accuracy-centric behavior. Set one or both when you want multi-metric selection.
+You must set `objective_mode` on every selector.
 
-### Formula
+### `objective_mode="weighted"`
 
-For each datapoint, after observations are recorded:
+Pass at least one of `lambda_cost > 0` or `lambda_latency > 0`. The library returns a single **`is_best`** combo using a linear scalar (accuracy minus weighted normalized cost/latency):
 
 ```
-combined = score
-         - lambda_cost    * norm(cost)
-         - lambda_latency * norm(latency)
+combined = score - lambda_cost * norm(cost) - lambda_latency * norm(latency)
 ```
 
-- **`score`** — return value of `eval_fn` (higher is better).
-- **`norm(·)`** — min–max scale to `[0, 1]` using running min/max over **all** samples seen during that selector run (updated as more combos are evaluated).
-- **Per combination** — mean of per-datapoint combined values → `ModelResult.combined_objective` (see [results.md](results.md)).
+```python
+selector = ModelSelector(
+    ...,
+    objective_mode="weighted",
+    lambda_cost=0.3,
+    lambda_latency=0.2,
+    model_prices={...},
+)
+results = selector.select_best()
+best = results.get_best()
+```
 
-This is a **linear scalarization**, not Pareto exploration. Larger `lambda_*` penalize cost/latency more strongly relative to score.
+### `objective_mode="pareto"`
 
-### Example
+Do **not** pass `lambda_cost` or `lambda_latency`. The library minimizes **error** (`1 - score`), **latency**, and **cost** (when priced), marks nondominated combos, and exposes `results.get_pareto_front()` and `results.plot_pareto()` (error on the y-axis; ideal corner at 0).
 
 ```python
 selector = ModelSelector(
-    agent=MyAgent,
-    models=models,
-    eval_fn=eval_fn,
-    dataset=dataset,
+    ...,
     method="matrix_ucb",
-    lambda_cost=0.3,      # optional — omit for accuracy-only
-    lambda_latency=0.2,
-    model_prices={        # recommended when lambda_cost > 0
-        "gpt-4o": {"input_price": 2.5, "output_price": 10.0},
-        "gpt-4o-mini": {"input_price": 0.15, "output_price": 0.6},
-    },
+    objective_mode="pareto",
 )
-results = selector.select_best(parallel=True)
-results.print_summary()   # ranks by combined_objective when lambdas are set
+results = selector.select_best()
+results.get_pareto_front()
+results.plot_pareto()
 ```
 
-### How each method uses the weights
+For `matrix_ucb` / `matrix_ucb_lrf`, exploration uses **Chebyshev scalarization** over normalized gaps (ideal = 0 error, 0s, $0); tradeoff directions rotate automatically — no extra knobs.
 
-| Methods | During search | Final `is_best` |
+| Methods | Weighted search | Pareto search |
 |:---|:---|:---|
-| `matrix_ucb`, `matrix_ucb_lrf` | UCB rewards use per-cell combined objective | `_find_best` on `combined_objective` |
-| `arm_elimination`, `epsilon_lucb`, `threshold` | Elimination / LUCB stats on combined per-sample objectives | same |
-| `hill_climbing`, `bayesian` | Move / surrogate target uses combined objective | same |
-| `brute_force`, `random` | Does not steer *which* combos to try | same |
-| `lm_proposal` | Proposer uses `objective=` **text**, not these lambdas | `combined_objective` on the one evaluated combo only |
-
-After `select_best()`, a final pass recomputes every result’s `combined_objective` against the **full-run** normalizer so rankings are comparable.
+| `matrix_ucb`, `matrix_ucb_lrf` | Per-cell linear combined objective | Chebyshev cell reward |
+| Other bandits | Combined per-sample stats where applicable | Full eval → frontier marking |
+| `brute_force`, `random` | Final rank only | Final frontier only |
 
 !!! note "`lm_proposal` vs lambdas"
-    `LMProposalModelSelector(objective="...")` is a natural-language hint to the **proposer LLM**. It is separate from `lambda_cost` / `lambda_latency`, which only affect the scalar reward used for ranking and bandit methods.
+    `LMProposalModelSelector(objective="...")` is a natural-language hint to the **proposer LLM**. It is separate from `objective_mode` and `lambda_*`.
 
 ## `select_best()`
 

diff --git a/examples/selection/daemon/basic.py b/examples/selection/daemon/basic.py
@@ -91,6 +91,7 @@ def eval_fn(expected: str, actual: str) -> float:
         eval_fn=eval_fn,
         dataset=dataset,
         method="brute_force",
+        objective_mode="pareto",
     )
     results = selector.select_best(parallel=False)
     results.print_summary()

diff --git a/examples/selection/local/advanced_algorithms.py b/examples/selection/local/advanced_algorithms.py
@@ -98,7 +98,13 @@ def eval_fn(expected, actual):
 def run_auto():
     """method="auto" — automatically finds the best combination (default; wired to arm_elimination — strong best-arm identification, cheaper than brute_force)."""
     selector = ModelSelector(
-        agent=MyAgent, models=models, eval_fn=eval_fn, dataset=dataset, method="auto",
+        agent=MyAgent,
+        models=models,
+        eval_fn=eval_fn,
+        dataset=dataset,
+        method="auto",
+        objective_mode="weighted",
+        lambda_latency=0.2,
     )
     return selector.select_best(parallel=True)
 
@@ -112,6 +118,8 @@ def run_random():
         dataset=dataset,
         method="random",
         sample_fraction=0.25,  # evaluate 25% of all combinations
+        objective_mode="weighted",
+        lambda_latency=0.2,
     )
     return selector.select_best(parallel=True)
 
@@ -125,6 +133,8 @@ def run_hill_climbing():
         dataset=dataset,
         method="hill_climbing",
         batch_size=4,  # number of neighbors to evaluate per step
+        objective_mode="weighted",
+        lambda_latency=0.2,
     )
     return selector.select_best(parallel=True)
 
@@ -137,6 +147,8 @@ def run_arm_elimination():
         eval_fn=eval_fn,
         dataset=dataset,
         method="arm_elimination",
+        objective_mode="weighted",
+        lambda_latency=0.2,
     )
     return selector.select_best(parallel=True)
 
@@ -150,6 +162,8 @@ def run_epsilon_lucb():
         dataset=dataset,
         method="epsilon_lucb",
         epsilon=0.01,  # acceptable gap from the true best
+        objective_mode="weighted",
+        lambda_latency=0.2,
     )
     return selector.select_best(parallel=True)
 
@@ -163,6 +177,8 @@ def run_threshold():
         dataset=dataset,
         method="threshold",
         threshold=0.75,  # minimum acceptable accuracy
+        objective_mode="weighted",
+        lambda_latency=0.2,
     )
     return selector.select_best(parallel=True)
 
@@ -175,6 +191,8 @@ def run_lm_proposal():
         eval_fn=eval_fn,
         dataset=dataset,
         method="lm_proposal",
+        objective_mode="weighted",
+        lambda_latency=0.2,
     )
     return selector.select_best(parallel=True)
 
@@ -189,6 +207,8 @@ def run_bayesian():
         method="bayesian",
         batch_size=4,
         sample_fraction=0.25,  # evaluate 25% of all combinations
+        objective_mode="weighted",
+        lambda_latency=0.2,
     )
     return selector.select_best(parallel=True)
 
@@ -203,6 +223,7 @@ def run_matrix_ucb():
         method="matrix_ucb",
         a=1.0,
         sample_fraction=0.1,
+        objective_mode="pareto",
     )
     return selector.select_best(max_concurrent=4)
 
@@ -221,6 +242,7 @@ def run_matrix_ucb_lrf():
         eta=5.0,
         warmup_fraction=0.05,
         sample_fraction=0.1,
+        objective_mode="pareto",
     )
     # Unlike matrix_ucb (which always uses async eval), LRF still uses parallel=True
     # for concurrent cell evaluation; sequential path is sync-only.

diff --git a/examples/selection/local/ag2.py b/examples/selection/local/ag2.py
@@ -97,6 +97,7 @@ def eval_fn(expected, actual):
         eval_fn=eval_fn,
         dataset=dataset,
         method="brute_force",  # or "auto" for smarter selection algorithms
+        objective_mode="pareto",
     )
 
     results = selector.select_best(parallel=True)

diff --git a/examples/selection/local/crewai.py b/examples/selection/local/crewai.py
@@ -111,6 +111,7 @@ def eval_fn(expected, actual):
         eval_fn=eval_fn,
         dataset=dataset,
         method="brute_force",  # or "auto" for smarter selection algorithms
+        objective_mode="pareto",
     )
 
     results = selector.select_best(parallel=True)

diff --git a/examples/selection/local/custom_agent.py b/examples/selection/local/custom_agent.py
@@ -111,6 +111,7 @@ def eval_fn(expected, actual):
         eval_fn=eval_fn,
         dataset=dataset,
         method="brute_force",  # or "auto" for smarter selection algorithms
+        objective_mode="pareto",
     )
 
     results = selector.select_best(parallel=True)

diff --git a/examples/selection/local/langchain.py b/examples/selection/local/langchain.py
@@ -92,6 +92,7 @@ def eval_fn(expected, actual):
         eval_fn=eval_fn,
         dataset=dataset,
         method="brute_force",  # or "auto" for smarter selection algorithms
+        objective_mode="pareto",
     )
 
     results = selector.select_best(parallel=True)

diff --git a/examples/selection/local/langgraph.py b/examples/selection/local/langgraph.py
@@ -113,6 +113,7 @@ def eval_fn(expected, actual):
         eval_fn=eval_fn,
         dataset=dataset,
         method="brute_force",  # or "auto" for smarter selection algorithms
+        objective_mode="pareto",
     )
 
     results = selector.select_best(parallel=True)

diff --git a/examples/selection/local/llamaindex.py b/examples/selection/local/llamaindex.py
@@ -103,6 +103,7 @@ def eval_fn(expected, actual):
         eval_fn=eval_fn,
         dataset=dataset,
         method="brute_force",  # or "auto" for smarter selection algorithms
+        objective_mode="pareto",
     )
 
     results = selector.select_best(parallel=True)

diff --git a/examples/selection/local/openai_sdk.py b/examples/selection/local/openai_sdk.py
@@ -88,6 +88,7 @@ def eval_fn(expected, actual):
         eval_fn=eval_fn,
         dataset=dataset,
         method="brute_force",  # or "auto" for smarter selection algorithms
+        objective_mode="pareto",
     )
 
     results = selector.select_best(parallel=True)

diff --git a/examples/shared/openclaw_agent.py b/examples/shared/openclaw_agent.py
@@ -20,6 +20,7 @@
         eval_fn=my_eval_fn,
         dataset=my_dataset,
         method="brute_force",
+        objective_mode="pareto",
     )
     results = selector.select_best(parallel=False)
 

diff --git a/src/agentopt/__init__.py b/src/agentopt/__init__.py
@@ -135,11 +135,10 @@ def ModelSelector(
             ``"epsilon_lucb"``, ``"matrix_ucb"``, ``"matrix_ucb_lrf"``,
             ``"threshold"``,
             ``"lm_proposal"``, ``"bayesian"``.
-        **kwargs: Additional arguments passed to the selector
-            (e.g. ``epsilon``, ``threshold``, ``sample_fraction``, ``warmup_fraction``
-            for matrix UCB-LRF; ``lambda_cost``, ``lambda_latency`` for the optional
-            combined objective ``score - lambda_cost*norm_cost -
-            lambda_latency*norm_latency`` — both default to ``0.0`` / accuracy-only).
+        **kwargs: Additional arguments passed to the selector. Required:
+            ``objective_mode`` — ``"weighted"`` (pass ``lambda_cost`` and/or
+            ``lambda_latency`` > 0) or ``"pareto"`` (frontier; Chebyshev matrix UCB).
+            Other options: ``epsilon``, ``threshold``, ``sample_fraction``, etc.
 
     Returns:
         A selector instance. Call ``.select_best()`` to run.

diff --git a/src/agentopt/model_selection/__init__.py b/src/agentopt/model_selection/__init__.py
@@ -9,6 +9,7 @@
 from .random_search import RandomSearchModelSelector
 from .threshold_successive_elimination import ThresholdBanditSEModelSelector
 from .matrix_ucb import MatrixUCBLRFModelSelector, MatrixUCBModelSelector
+from .objectives import ObjectiveMode
 
 # Bayesian is optional (requires torch/botorch)
 try:
@@ -31,4 +32,5 @@
     "DatapointResult",
     "ModelResult",
     "SelectionResults",
+    "ObjectiveMode",
 ]
diff --git a/src/agentopt/model_selection/arm_elimination.py b/src/agentopt/model_selection/arm_elimination.py
@@ -29,6 +29,7 @@ def __init__(
         confidence: float = 1.0,
         model_prices: Optional[Dict[str, Dict[str, float]]] = None,
         tracker=None,
+        objective_mode: Optional[str] = None,
         lambda_cost: float = 0.0,
         lambda_latency: float = 0.0,
     ) -> None:
@@ -39,6 +40,7 @@ def __init__(
             dataset=dataset,
             model_prices=model_prices,
             tracker=tracker,
+            objective_mode=objective_mode,
             lambda_cost=lambda_cost,
             lambda_latency=lambda_latency,
         )
@@ -152,7 +154,9 @@ def _select_sequential(self) -> SelectionResults:
         all_results = self._build_results(
             all_combos, combo_scores, combo_latencies, combo_costs, combo_dp_ids
         )
-        return SelectionResults(results=all_results)
+        return SelectionResults(
+            results=all_results, objective_mode=self.objective_mode,
+        )
 
     async def _select_async(self, max_concurrent: int = 20) -> SelectionResults:
         all_combos = self._all_combos()
@@ -274,7 +278,9 @@ async def _eval_batch(
         all_results = self._build_results(
             all_combos, combo_scores, combo_latencies, combo_costs, combo_dp_ids
         )
-        return SelectionResults(results=all_results)
+        return SelectionResults(
+            results=all_results, objective_mode=self.objective_mode,
+        )
 
     # ------------------------------------------------------------------
     # Statistical helpers
@@ -341,13 +347,16 @@ def _build_results(
                 )
 
         self._finalize_combined_objectives(all_results)
-        best_info = self._find_best(all_results)
-        if best_info is not None:
-            best_name, _ = best_info
-            for result in all_results:
-                if result.model_name == best_name:
-                    result.is_best = True
-                    break
+        if self.objective_mode == "pareto":
+            self._mark_pareto_optimal(all_results)
         else:
-            print("\n  No combinations succeeded.")
+            best_info = self._find_best(all_results)
+            if best_info is not None:
+                best_name, _ = best_info
+                for result in all_results:
+                    if result.model_name == best_name:
+                        result.is_best = True
+                        break
+            else:
+                print("\n  No combinations succeeded.")
         return all_results