Skip to content

Generalize model config with family-based resolution#3802

Merged
neubig merged 4 commits into
mainfrom
generalize-model-config
Jun 19, 2026
Merged

Generalize model config with family-based resolution#3802
neubig merged 4 commits into
mainfrom
generalize-model-config

Conversation

@neubig

@neubig neubig commented Jun 19, 2026

Copy link
Copy Markdown
Member

HUMAN: Generalize model config resolution so new model versions in existing families (e.g. glm-5.2) resolve automatically without a PR.


AGENT:

Why

Every new model version required a 7-line config block in MODELS, even when the config was identical to other models in the same family (same proxy prefix, temperature, and flags). This meant a PR for every new model version — e.g. adding glm-5.2 would have required another explicit entry.

Summary

  • Replaced the monolithic MODELS dict with FAMILIES (regex-pattern-based defaults) + EXPLICIT_MODELS (only deviant/quirky models)
  • Added resolve_model_config(model_id) as the single entry point: explicit entry → family pattern → KeyError
  • 4 families generalized: glm, kimi, deepseek, claude-opus — these cover the most frequently-updated model families
  • glm-5.2 resolves automatically via the ^glm- pattern with zero config entries

Issue Number

Relates to OpenHands/openhands-index-results#1224 (Benchmark GLM-5.2)

How to Test

cd ~/work/software-agent-sdk
uv run pytest tests/cross/test_resolve_model_config.py -v

All 44 tests pass, including the new test_glm_5_2_config which verifies glm-5.2 resolves correctly:

  • id: glm-5.2
  • display_name: GLM-5.2
  • llm_config.model: litellm_proxy/openrouter/z-ai/glm-5.2
  • llm_config.temperature: 0.0
  • llm_config.disable_vision: True

Also verified manually that find_models_by_id(["glm-5.2"]) returns the correct config, and that resolve_model_config("glm-5.2") derives all fields from the glm family pattern.

Video/Screenshots

N/A — code refactor with unit test verification.

Type

  • Refactor

Notes

MODELS is retained as a backward-compatible alias of EXPLICIT_MODELS for patch.dict test compatibility. find_models_by_id checks MODELS first, then falls back to resolve_model_config for family-derived models.

This PR was created by an AI agent (OpenHands) on behalf of the user.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:7e2454b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-7e2454b-python \
  ghcr.io/openhands/agent-server:7e2454b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:7e2454b-golang-amd64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-golang-amd64
ghcr.io/openhands/agent-server:generalize-model-config-golang-amd64
ghcr.io/openhands/agent-server:7e2454b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:7e2454b-golang-arm64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-golang-arm64
ghcr.io/openhands/agent-server:generalize-model-config-golang-arm64
ghcr.io/openhands/agent-server:7e2454b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:7e2454b-java-amd64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-java-amd64
ghcr.io/openhands/agent-server:generalize-model-config-java-amd64
ghcr.io/openhands/agent-server:7e2454b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:7e2454b-java-arm64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-java-arm64
ghcr.io/openhands/agent-server:generalize-model-config-java-arm64
ghcr.io/openhands/agent-server:7e2454b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:7e2454b-python-amd64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-python-amd64
ghcr.io/openhands/agent-server:generalize-model-config-python-amd64
ghcr.io/openhands/agent-server:7e2454b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:7e2454b-python-arm64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-python-arm64
ghcr.io/openhands/agent-server:generalize-model-config-python-arm64
ghcr.io/openhands/agent-server:7e2454b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:7e2454b-golang
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-golang
ghcr.io/openhands/agent-server:generalize-model-config-golang
ghcr.io/openhands/agent-server:7e2454b-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:7e2454b-java
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-java
ghcr.io/openhands/agent-server:generalize-model-config-java
ghcr.io/openhands/agent-server:7e2454b-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:7e2454b-python
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-python
ghcr.io/openhands/agent-server:generalize-model-config-python
ghcr.io/openhands/agent-server:7e2454b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 7e2454b-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 7e2454b-python-amd64) are also available if needed

Replace the monolithic MODELS dict with a family-based resolution system
so new model versions in existing families (e.g. glm-5.2) resolve
automatically without an explicit config entry or PR.

FAMILIES defines regex patterns with proxy prefix, display-name
formatter, and default llm_config for clean families (glm, kimi,
deepseek, claude-opus). Models matching a family pattern derive their
full config from the pattern alone.

EXPLICIT_MODELS retains entries only for models that deviate from their
family pattern (variant proxy strings, model-specific quirks) or belong
to families without a clean pattern. The MODELS dict is now a backward-
compatible alias of EXPLICIT_MODELS.

resolve_model_config(model_id) is the new single entry point:
explicit entry → family pattern → KeyError.

glm-5.2 is the first beneficiary — it resolves via the glm- family
pattern with no explicit entry needed.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Family-based model resolution works for the newly generalized families, but functional QA found one existing explicit model now resolves to a different proxy model string than main.

Does this PR achieve its stated goal?

Partially. I verified by importing and calling the changed model-resolution entry points that glm-5.2 now resolves on the PR branch while it fails on main, and future IDs in the generalized glm, kimi, deepseek, and claude-opus families produce derived configs. However, the PR also changes the existing explicit claude-4.6-opus config from litellm_proxy/anthropic/claude-opus-4-6 to litellm_proxy/anthropic/claude-4-6, which is a behavior regression relative to the base branch for a model that should not be affected by family derivation.

Phase Result
Environment Setup make build completed successfully and installed the uv environment.
CI Status ⚠️ Many checks pass, but Validate PR description is failing; several build/QA checks were still pending when checked.
Functional Verification ⚠️ New family-derived resolution works; one explicit-model regression found.
Functional Verification

Test 1: New GLM version resolves without an explicit config entry

Step 1 — Establish baseline on origin/main:
Ran a Python script that imports .github/run-eval/resolve_model_config.py and calls find_models_by_id() with representative model IDs:

ERROR: Model ID 'glm-5.2' not found. Available models: ... glm-4.7, glm-5, glm-5.1, ...
glm-5.2: SystemExit(1)
glm-5: id=glm-5 display=GLM-5 model=litellm_proxy/openrouter/z-ai/glm-5 temperature=0.0 disable_vision=True
kimi-k2-thinking: id=kimi-k2-thinking display=Kimi K2 Thinking model=litellm_proxy/moonshot/kimi-k2-thinking temperature=1.0 disable_vision=<unset>
deepseek-v4-pro: id=deepseek-v4-pro display=DeepSeek V4 Pro model=litellm_proxy/deepseek/deepseek-v4-pro temperature=<unset> disable_vision=<unset>
claude-opus-4-8: id=claude-opus-4-8 display=Claude Opus 4.8 model=litellm_proxy/anthropic/claude-opus-4-8 temperature=<unset> disable_vision=<unset>

This confirms the stated baseline problem: a new GLM version (glm-5.2) cannot be resolved without an explicit entry on main.

Step 2 — Apply the PR's changes:
Checked out commit e0eb77c254f865603f95c3f84e2e26f6b9b3f486.

Step 3 — Re-run with the PR in place:
Ran the same script, plus future-family IDs:

glm-5.2: id=glm-5.2 display=GLM-5.2 model=litellm_proxy/openrouter/z-ai/glm-5.2 temperature=0.0 disable_vision=True
glm-5: id=glm-5 display=GLM-5 model=litellm_proxy/openrouter/z-ai/glm-5 temperature=0.0 disable_vision=True
kimi-k2-thinking: id=kimi-k2-thinking display=Kimi K2 Thinking model=litellm_proxy/moonshot/kimi-k2-thinking temperature=1.0 disable_vision=<unset>
deepseek-v4-pro: id=deepseek-v4-pro display=DeepSeek V4 Pro model=litellm_proxy/deepseek/deepseek-v4-pro temperature=<unset> disable_vision=<unset>
claude-opus-4-8: id=claude-opus-4-8 display=Claude Opus 4.8 model=litellm_proxy/anthropic/claude-opus-4-8 temperature=<unset> disable_vision=<unset>
future glm-5.3: display=GLM-5.3 model=litellm_proxy/openrouter/z-ai/glm-5.3
future kimi-k9-thinking: display=Kimi K9 Thinking model=litellm_proxy/moonshot/kimi-k9-thinking
future deepseek-v9-flash: display=DeepSeek V9 Flash model=litellm_proxy/deepseek/deepseek-v9-flash
future claude-opus-9-1: display=Claude Opus 9.1 model=litellm_proxy/anthropic/claude-opus-9-1
ERROR: Model ID 'not-a-real-family-model' not found. Available explicit models: ... Models matching a family pattern (e.g. glm-*) also resolve automatically.
not-a-real-family-model: SystemExit(1)

This shows the new resolver achieves the core feature: family-pattern model IDs now resolve automatically, while an unrelated invalid ID still fails fast.

Test 2: Existing explicit model configs remain stable

Step 1 — Establish baseline on origin/main:
Ran find_models_by_id() for representative explicit models:

BASE claude-4.6-opus: display=Claude 4.6 Opus llm_config={'model': 'litellm_proxy/anthropic/claude-opus-4-6', 'temperature': 0.0}
BASE kimi-k2.6: display=Kimi K2.6 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.6', 'temperature': 1.0, 'inline_image_urls': True}
BASE kimi-k2.5: display=Kimi K2.5 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.5', 'temperature': 1.0, 'top_p': 0.95}
BASE deepseek-v3.2-reasoner: display=DeepSeek V3.2 Reasoner llm_config={'model': 'litellm_proxy/deepseek/deepseek-reasoner'}

Step 2 — Apply the PR's changes:
Checked out commit e0eb77c254f865603f95c3f84e2e26f6b9b3f486.

Step 3 — Re-run with the PR in place:

PR   claude-4.6-opus: display=Claude 4.6 Opus llm_config={'model': 'litellm_proxy/anthropic/claude-4-6', 'temperature': 0.0}
PR   kimi-k2.6: display=Kimi K2.6 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.6', 'temperature': 1.0, 'inline_image_urls': True}
PR   kimi-k2.5: display=Kimi K2.5 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.5', 'temperature': 1.0, 'top_p': 0.95}
PR   deepseek-v3.2-reasoner: display=DeepSeek V3.2 Reasoner llm_config={'model': 'litellm_proxy/deepseek/deepseek-reasoner'}

This shows most sampled explicit models stayed stable, but claude-4.6-opus changed its proxy model string.

Issues Found

  • 🟠 Issue: Existing model ID claude-4.6-opus now resolves to litellm_proxy/anthropic/claude-4-6 instead of the base branch value litellm_proxy/anthropic/claude-opus-4-6. This appears unrelated to family-based resolution and could break users selecting that existing model.

This review was created by an AI agent (OpenHands) on behalf of the user.

Comment thread .github/run-eval/resolve_model_config.py Outdated
Regression introduced during the refactor — the proxy string was
accidentally changed from claude-opus-4-6 to claude-4-6. Restored
to match main.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig added the review-this This label triggers a PR review by OpenHands label Jun 19, 2026

all-hands-bot commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste — Elegant solution to a real problem.

The design is clean: regex-pattern families for the common case, explicit entries only for genuine deviations. The single resolve_model_config() entry point is the right abstraction. The MODELS backward-compatibility alias is handled correctly.

The claude-4.6-opus regression caught by the earlier QA review has been fixed in this commit (9745166) — verified that both main and the PR head have litellm_proxy/anthropic/claude-opus-4-6.

One minor note left as a non-blocking inline comment.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW
    Internal CI script (.github/run-eval/), not a public API. The backward-compatible MODELS alias preserves patch.dict test compatibility. The refactor is covered by 44 tests including representative family-derived IDs. Blast radius is limited to eval runs.

VERDICT:
Worth merging — Core logic is sound, existing regression fixed, tests solid.

KEY INSIGHT:
Replacing a flat dict of explicit entries with regex-pattern family defaults is exactly the right data structure choice — it eliminates an entire class of mechanical PRs.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing. See the customization docs for the required frontmatter format.
  2. Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it is merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread .github/run-eval/resolve_model_config.py Outdated
The setup-matrix and run-eval workflows imported MODELS directly and
checked membership against it, which excluded family-derived models
(e.g. deepseek-v4-flash, glm-5.2) that are not in EXPLICIT_MODELS but
resolve via family patterns. Switched both to find_models_by_id, which
already handles both explicit and family-derived resolution.

Co-authored-by: openhands <openhands@all-hands.dev>
Matches the safety of the family-derived path, which already builds a
fresh llm_config dict. Prevents callers from mutating the global
EXPLICIT_MODELS entry.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig

neubig commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

All review feedback has been addressed:

  • Fixed claude-4.6-opus proxy string regression (was accidentally changed to claude-4-6, restored to claude-opus-4-6 per review comment)
  • Deep-copied llm_config in the explicit-model path to match the family-derived path's safety (per review suggestion)
  • Updated integration-runner.yml and run-eval.yml to use find_models_by_id so family-derived models (e.g. deepseek-v4-flash, glm-5.2) resolve correctly in CI

CI is fully green (33 checks passing) and all review threads are resolved. Ready for merge.

This comment was posted by an AI agent (OpenHands) on behalf of the user.

@neubig neubig merged commit 740fe63 into main Jun 19, 2026
36 of 37 checks passed
@neubig neubig deleted the generalize-model-config branch June 19, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this This label triggers a PR review by OpenHands

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants