Generalize model config with family-based resolution#3802
Conversation
Replace the monolithic MODELS dict with a family-based resolution system so new model versions in existing families (e.g. glm-5.2) resolve automatically without an explicit config entry or PR. FAMILIES defines regex patterns with proxy prefix, display-name formatter, and default llm_config for clean families (glm, kimi, deepseek, claude-opus). Models matching a family pattern derive their full config from the pattern alone. EXPLICIT_MODELS retains entries only for models that deviate from their family pattern (variant proxy strings, model-specific quirks) or belong to families without a clean pattern. The MODELS dict is now a backward- compatible alias of EXPLICIT_MODELS. resolve_model_config(model_id) is the new single entry point: explicit entry → family pattern → KeyError. glm-5.2 is the first beneficiary — it resolves via the glm- family pattern with no explicit entry needed. Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
Family-based model resolution works for the newly generalized families, but functional QA found one existing explicit model now resolves to a different proxy model string than main.
Does this PR achieve its stated goal?
Partially. I verified by importing and calling the changed model-resolution entry points that glm-5.2 now resolves on the PR branch while it fails on main, and future IDs in the generalized glm, kimi, deepseek, and claude-opus families produce derived configs. However, the PR also changes the existing explicit claude-4.6-opus config from litellm_proxy/anthropic/claude-opus-4-6 to litellm_proxy/anthropic/claude-4-6, which is a behavior regression relative to the base branch for a model that should not be affected by family derivation.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully and installed the uv environment. |
| CI Status | Validate PR description is failing; several build/QA checks were still pending when checked. |
| Functional Verification |
Functional Verification
Test 1: New GLM version resolves without an explicit config entry
Step 1 — Establish baseline on origin/main:
Ran a Python script that imports .github/run-eval/resolve_model_config.py and calls find_models_by_id() with representative model IDs:
ERROR: Model ID 'glm-5.2' not found. Available models: ... glm-4.7, glm-5, glm-5.1, ...
glm-5.2: SystemExit(1)
glm-5: id=glm-5 display=GLM-5 model=litellm_proxy/openrouter/z-ai/glm-5 temperature=0.0 disable_vision=True
kimi-k2-thinking: id=kimi-k2-thinking display=Kimi K2 Thinking model=litellm_proxy/moonshot/kimi-k2-thinking temperature=1.0 disable_vision=<unset>
deepseek-v4-pro: id=deepseek-v4-pro display=DeepSeek V4 Pro model=litellm_proxy/deepseek/deepseek-v4-pro temperature=<unset> disable_vision=<unset>
claude-opus-4-8: id=claude-opus-4-8 display=Claude Opus 4.8 model=litellm_proxy/anthropic/claude-opus-4-8 temperature=<unset> disable_vision=<unset>
This confirms the stated baseline problem: a new GLM version (glm-5.2) cannot be resolved without an explicit entry on main.
Step 2 — Apply the PR's changes:
Checked out commit e0eb77c254f865603f95c3f84e2e26f6b9b3f486.
Step 3 — Re-run with the PR in place:
Ran the same script, plus future-family IDs:
glm-5.2: id=glm-5.2 display=GLM-5.2 model=litellm_proxy/openrouter/z-ai/glm-5.2 temperature=0.0 disable_vision=True
glm-5: id=glm-5 display=GLM-5 model=litellm_proxy/openrouter/z-ai/glm-5 temperature=0.0 disable_vision=True
kimi-k2-thinking: id=kimi-k2-thinking display=Kimi K2 Thinking model=litellm_proxy/moonshot/kimi-k2-thinking temperature=1.0 disable_vision=<unset>
deepseek-v4-pro: id=deepseek-v4-pro display=DeepSeek V4 Pro model=litellm_proxy/deepseek/deepseek-v4-pro temperature=<unset> disable_vision=<unset>
claude-opus-4-8: id=claude-opus-4-8 display=Claude Opus 4.8 model=litellm_proxy/anthropic/claude-opus-4-8 temperature=<unset> disable_vision=<unset>
future glm-5.3: display=GLM-5.3 model=litellm_proxy/openrouter/z-ai/glm-5.3
future kimi-k9-thinking: display=Kimi K9 Thinking model=litellm_proxy/moonshot/kimi-k9-thinking
future deepseek-v9-flash: display=DeepSeek V9 Flash model=litellm_proxy/deepseek/deepseek-v9-flash
future claude-opus-9-1: display=Claude Opus 9.1 model=litellm_proxy/anthropic/claude-opus-9-1
ERROR: Model ID 'not-a-real-family-model' not found. Available explicit models: ... Models matching a family pattern (e.g. glm-*) also resolve automatically.
not-a-real-family-model: SystemExit(1)
This shows the new resolver achieves the core feature: family-pattern model IDs now resolve automatically, while an unrelated invalid ID still fails fast.
Test 2: Existing explicit model configs remain stable
Step 1 — Establish baseline on origin/main:
Ran find_models_by_id() for representative explicit models:
BASE claude-4.6-opus: display=Claude 4.6 Opus llm_config={'model': 'litellm_proxy/anthropic/claude-opus-4-6', 'temperature': 0.0}
BASE kimi-k2.6: display=Kimi K2.6 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.6', 'temperature': 1.0, 'inline_image_urls': True}
BASE kimi-k2.5: display=Kimi K2.5 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.5', 'temperature': 1.0, 'top_p': 0.95}
BASE deepseek-v3.2-reasoner: display=DeepSeek V3.2 Reasoner llm_config={'model': 'litellm_proxy/deepseek/deepseek-reasoner'}
Step 2 — Apply the PR's changes:
Checked out commit e0eb77c254f865603f95c3f84e2e26f6b9b3f486.
Step 3 — Re-run with the PR in place:
PR claude-4.6-opus: display=Claude 4.6 Opus llm_config={'model': 'litellm_proxy/anthropic/claude-4-6', 'temperature': 0.0}
PR kimi-k2.6: display=Kimi K2.6 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.6', 'temperature': 1.0, 'inline_image_urls': True}
PR kimi-k2.5: display=Kimi K2.5 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.5', 'temperature': 1.0, 'top_p': 0.95}
PR deepseek-v3.2-reasoner: display=DeepSeek V3.2 Reasoner llm_config={'model': 'litellm_proxy/deepseek/deepseek-reasoner'}
This shows most sampled explicit models stayed stable, but claude-4.6-opus changed its proxy model string.
Issues Found
- 🟠 Issue: Existing model ID
claude-4.6-opusnow resolves tolitellm_proxy/anthropic/claude-4-6instead of the base branch valuelitellm_proxy/anthropic/claude-opus-4-6. This appears unrelated to family-based resolution and could break users selecting that existing model.
This review was created by an AI agent (OpenHands) on behalf of the user.
Regression introduced during the refactor — the proxy string was accidentally changed from claude-opus-4-6 to claude-4-6. Restored to match main. Co-authored-by: openhands <openhands@all-hands.dev>
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste — Elegant solution to a real problem.
The design is clean: regex-pattern families for the common case, explicit entries only for genuine deviations. The single resolve_model_config() entry point is the right abstraction. The MODELS backward-compatibility alias is handled correctly.
The claude-4.6-opus regression caught by the earlier QA review has been fixed in this commit (9745166) — verified that both main and the PR head have litellm_proxy/anthropic/claude-opus-4-6.
One minor note left as a non-blocking inline comment.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Internal CI script (.github/run-eval/), not a public API. The backward-compatibleMODELSalias preservespatch.dicttest compatibility. The refactor is covered by 44 tests including representative family-derived IDs. Blast radius is limited to eval runs.
VERDICT:
✅ Worth merging — Core logic is sound, existing regression fixed, tests solid.
KEY INSIGHT:
Replacing a flat dict of explicit entries with regex-pattern family defaults is exactly the right data structure choice — it eliminates an entire class of mechanical PRs.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing. See the customization docs for the required frontmatter format.- Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it is merge-ready.Was this review helpful? React with 👍 or 👎 to give feedback.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
The setup-matrix and run-eval workflows imported MODELS directly and checked membership against it, which excluded family-derived models (e.g. deepseek-v4-flash, glm-5.2) that are not in EXPLICIT_MODELS but resolve via family patterns. Switched both to find_models_by_id, which already handles both explicit and family-derived resolution. Co-authored-by: openhands <openhands@all-hands.dev>
Matches the safety of the family-derived path, which already builds a fresh llm_config dict. Prevents callers from mutating the global EXPLICIT_MODELS entry. Co-authored-by: openhands <openhands@all-hands.dev>
|
All review feedback has been addressed:
CI is fully green (33 checks passing) and all review threads are resolved. Ready for merge. This comment was posted by an AI agent (OpenHands) on behalf of the user. |
HUMAN: Generalize model config resolution so new model versions in existing families (e.g. glm-5.2) resolve automatically without a PR.
AGENT:
Why
Every new model version required a 7-line config block in
MODELS, even when the config was identical to other models in the same family (same proxy prefix, temperature, and flags). This meant a PR for every new model version — e.g. addingglm-5.2would have required another explicit entry.Summary
MODELSdict withFAMILIES(regex-pattern-based defaults) +EXPLICIT_MODELS(only deviant/quirky models)resolve_model_config(model_id)as the single entry point: explicit entry → family pattern →KeyErrorglm-5.2resolves automatically via the^glm-pattern with zero config entriesIssue Number
Relates to OpenHands/openhands-index-results#1224 (Benchmark GLM-5.2)
How to Test
All 44 tests pass, including the new
test_glm_5_2_configwhich verifies glm-5.2 resolves correctly:id:glm-5.2display_name:GLM-5.2llm_config.model:litellm_proxy/openrouter/z-ai/glm-5.2llm_config.temperature:0.0llm_config.disable_vision:TrueAlso verified manually that
find_models_by_id(["glm-5.2"])returns the correct config, and thatresolve_model_config("glm-5.2")derives all fields from the glm family pattern.Video/Screenshots
N/A — code refactor with unit test verification.
Type
Notes
MODELSis retained as a backward-compatible alias ofEXPLICIT_MODELSforpatch.dicttest compatibility.find_models_by_idchecksMODELSfirst, then falls back toresolve_model_configfor family-derived models.This PR was created by an AI agent (OpenHands) on behalf of the user.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:7e2454b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
7e2454b-python) is a multi-arch manifest supporting both amd64 and arm647e2454b-python-amd64) are also available if needed