Generalize model config with family-based resolution by neubig · Pull Request #3802 · OpenHands/software-agent-sdk

neubig · 2026-06-19T12:27:07Z

HUMAN: Generalize model config resolution so new model versions in existing families (e.g. glm-5.2) resolve automatically without a PR.

AGENT:

Why

Every new model version required a 7-line config block in MODELS, even when the config was identical to other models in the same family (same proxy prefix, temperature, and flags). This meant a PR for every new model version — e.g. adding glm-5.2 would have required another explicit entry.

Summary

Replaced the monolithic MODELS dict with FAMILIES (regex-pattern-based defaults) + EXPLICIT_MODELS (only deviant/quirky models)
Added resolve_model_config(model_id) as the single entry point: explicit entry → family pattern → KeyError
4 families generalized: glm, kimi, deepseek, claude-opus — these cover the most frequently-updated model families
glm-5.2 resolves automatically via the ^glm- pattern with zero config entries

Issue Number

Relates to OpenHands/openhands-index-results#1224 (Benchmark GLM-5.2)

How to Test

cd ~/work/software-agent-sdk
uv run pytest tests/cross/test_resolve_model_config.py -v

All 44 tests pass, including the new test_glm_5_2_config which verifies glm-5.2 resolves correctly:

id: glm-5.2
display_name: GLM-5.2
llm_config.model: litellm_proxy/openrouter/z-ai/glm-5.2
llm_config.temperature: 0.0
llm_config.disable_vision: True

Also verified manually that find_models_by_id(["glm-5.2"]) returns the correct config, and that resolve_model_config("glm-5.2") derives all fields from the glm family pattern.

Video/Screenshots

N/A — code refactor with unit test verification.

Type

Refactor

Notes

MODELS is retained as a backward-compatible alias of EXPLICIT_MODELS for patch.dict test compatibility. find_models_by_id checks MODELS first, then falls back to resolve_model_config for family-derived models.

This PR was created by an AI agent (OpenHands) on behalf of the user.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:7e2454b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-7e2454b-python \
  ghcr.io/openhands/agent-server:7e2454b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:7e2454b-golang-amd64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-golang-amd64
ghcr.io/openhands/agent-server:generalize-model-config-golang-amd64
ghcr.io/openhands/agent-server:7e2454b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:7e2454b-golang-arm64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-golang-arm64
ghcr.io/openhands/agent-server:generalize-model-config-golang-arm64
ghcr.io/openhands/agent-server:7e2454b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:7e2454b-java-amd64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-java-amd64
ghcr.io/openhands/agent-server:generalize-model-config-java-amd64
ghcr.io/openhands/agent-server:7e2454b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:7e2454b-java-arm64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-java-arm64
ghcr.io/openhands/agent-server:generalize-model-config-java-arm64
ghcr.io/openhands/agent-server:7e2454b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:7e2454b-python-amd64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-python-amd64
ghcr.io/openhands/agent-server:generalize-model-config-python-amd64
ghcr.io/openhands/agent-server:7e2454b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:7e2454b-python-arm64
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-python-arm64
ghcr.io/openhands/agent-server:generalize-model-config-python-arm64
ghcr.io/openhands/agent-server:7e2454b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:7e2454b-golang
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-golang
ghcr.io/openhands/agent-server:generalize-model-config-golang
ghcr.io/openhands/agent-server:7e2454b-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:7e2454b-java
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-java
ghcr.io/openhands/agent-server:generalize-model-config-java
ghcr.io/openhands/agent-server:7e2454b-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:7e2454b-python
ghcr.io/openhands/agent-server:7e2454b29c344db75f7ce091ed9bacf288d0c952-python
ghcr.io/openhands/agent-server:generalize-model-config-python
ghcr.io/openhands/agent-server:7e2454b-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., 7e2454b-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 7e2454b-python-amd64) are also available if needed

Replace the monolithic MODELS dict with a family-based resolution system so new model versions in existing families (e.g. glm-5.2) resolve automatically without an explicit config entry or PR. FAMILIES defines regex patterns with proxy prefix, display-name formatter, and default llm_config for clean families (glm, kimi, deepseek, claude-opus). Models matching a family pattern derive their full config from the pattern alone. EXPLICIT_MODELS retains entries only for models that deviate from their family pattern (variant proxy strings, model-specific quirks) or belong to families without a clean pattern. The MODELS dict is now a backward- compatible alias of EXPLICIT_MODELS. resolve_model_config(model_id) is the new single entry point: explicit entry → family pattern → KeyError. glm-5.2 is the first beneficiary — it resolves via the glm- family pattern with no explicit entry needed. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-19T12:27:29Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-19T12:27:44Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

Family-based model resolution works for the newly generalized families, but functional QA found one existing explicit model now resolves to a different proxy model string than main.

Does this PR achieve its stated goal?

Partially. I verified by importing and calling the changed model-resolution entry points that glm-5.2 now resolves on the PR branch while it fails on main, and future IDs in the generalized glm, kimi, deepseek, and claude-opus families produce derived configs. However, the PR also changes the existing explicit claude-4.6-opus config from litellm_proxy/anthropic/claude-opus-4-6 to litellm_proxy/anthropic/claude-4-6, which is a behavior regression relative to the base branch for a model that should not be affected by family derivation.

Phase	Result
Environment Setup	✅ `make build` completed successfully and installed the uv environment.
CI Status	⚠️ Many checks pass, but `Validate PR description` is failing; several build/QA checks were still pending when checked.
Functional Verification	⚠️ New family-derived resolution works; one explicit-model regression found.

Functional Verification

Test 1: New GLM version resolves without an explicit config entry

Step 1 — Establish baseline on origin/main:
Ran a Python script that imports .github/run-eval/resolve_model_config.py and calls find_models_by_id() with representative model IDs:

ERROR: Model ID 'glm-5.2' not found. Available models: ... glm-4.7, glm-5, glm-5.1, ...
glm-5.2: SystemExit(1)
glm-5: id=glm-5 display=GLM-5 model=litellm_proxy/openrouter/z-ai/glm-5 temperature=0.0 disable_vision=True
kimi-k2-thinking: id=kimi-k2-thinking display=Kimi K2 Thinking model=litellm_proxy/moonshot/kimi-k2-thinking temperature=1.0 disable_vision=<unset>
deepseek-v4-pro: id=deepseek-v4-pro display=DeepSeek V4 Pro model=litellm_proxy/deepseek/deepseek-v4-pro temperature=<unset> disable_vision=<unset>
claude-opus-4-8: id=claude-opus-4-8 display=Claude Opus 4.8 model=litellm_proxy/anthropic/claude-opus-4-8 temperature=<unset> disable_vision=<unset>

This confirms the stated baseline problem: a new GLM version (glm-5.2) cannot be resolved without an explicit entry on main.

Step 2 — Apply the PR's changes:
Checked out commit e0eb77c254f865603f95c3f84e2e26f6b9b3f486.

Step 3 — Re-run with the PR in place:
Ran the same script, plus future-family IDs:

glm-5.2: id=glm-5.2 display=GLM-5.2 model=litellm_proxy/openrouter/z-ai/glm-5.2 temperature=0.0 disable_vision=True
glm-5: id=glm-5 display=GLM-5 model=litellm_proxy/openrouter/z-ai/glm-5 temperature=0.0 disable_vision=True
kimi-k2-thinking: id=kimi-k2-thinking display=Kimi K2 Thinking model=litellm_proxy/moonshot/kimi-k2-thinking temperature=1.0 disable_vision=<unset>
deepseek-v4-pro: id=deepseek-v4-pro display=DeepSeek V4 Pro model=litellm_proxy/deepseek/deepseek-v4-pro temperature=<unset> disable_vision=<unset>
claude-opus-4-8: id=claude-opus-4-8 display=Claude Opus 4.8 model=litellm_proxy/anthropic/claude-opus-4-8 temperature=<unset> disable_vision=<unset>
future glm-5.3: display=GLM-5.3 model=litellm_proxy/openrouter/z-ai/glm-5.3
future kimi-k9-thinking: display=Kimi K9 Thinking model=litellm_proxy/moonshot/kimi-k9-thinking
future deepseek-v9-flash: display=DeepSeek V9 Flash model=litellm_proxy/deepseek/deepseek-v9-flash
future claude-opus-9-1: display=Claude Opus 9.1 model=litellm_proxy/anthropic/claude-opus-9-1
ERROR: Model ID 'not-a-real-family-model' not found. Available explicit models: ... Models matching a family pattern (e.g. glm-*) also resolve automatically.
not-a-real-family-model: SystemExit(1)

This shows the new resolver achieves the core feature: family-pattern model IDs now resolve automatically, while an unrelated invalid ID still fails fast.

Test 2: Existing explicit model configs remain stable

Step 1 — Establish baseline on origin/main:
Ran find_models_by_id() for representative explicit models:

BASE claude-4.6-opus: display=Claude 4.6 Opus llm_config={'model': 'litellm_proxy/anthropic/claude-opus-4-6', 'temperature': 0.0}
BASE kimi-k2.6: display=Kimi K2.6 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.6', 'temperature': 1.0, 'inline_image_urls': True}
BASE kimi-k2.5: display=Kimi K2.5 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.5', 'temperature': 1.0, 'top_p': 0.95}
BASE deepseek-v3.2-reasoner: display=DeepSeek V3.2 Reasoner llm_config={'model': 'litellm_proxy/deepseek/deepseek-reasoner'}

Step 2 — Apply the PR's changes:
Checked out commit e0eb77c254f865603f95c3f84e2e26f6b9b3f486.

Step 3 — Re-run with the PR in place:

PR   claude-4.6-opus: display=Claude 4.6 Opus llm_config={'model': 'litellm_proxy/anthropic/claude-4-6', 'temperature': 0.0}
PR   kimi-k2.6: display=Kimi K2.6 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.6', 'temperature': 1.0, 'inline_image_urls': True}
PR   kimi-k2.5: display=Kimi K2.5 llm_config={'model': 'litellm_proxy/moonshot/kimi-k2.5', 'temperature': 1.0, 'top_p': 0.95}
PR   deepseek-v3.2-reasoner: display=DeepSeek V3.2 Reasoner llm_config={'model': 'litellm_proxy/deepseek/deepseek-reasoner'}

This shows most sampled explicit models stayed stable, but claude-4.6-opus changed its proxy model string.

Issues Found

🟠 Issue: Existing model ID claude-4.6-opus now resolves to litellm_proxy/anthropic/claude-4-6 instead of the base branch value litellm_proxy/anthropic/claude-opus-4-6. This appears unrelated to family-based resolution and could break users selecting that existing model.

This review was created by an AI agent (OpenHands) on behalf of the user.

Regression introduced during the refactor — the proxy string was accidentally changed from claude-opus-4-6 to claude-4-6. Restored to match main. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot · 2026-06-19T12:55:06Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

🟢 Good taste — Elegant solution to a real problem.

The design is clean: regex-pattern families for the common case, explicit entries only for genuine deviations. The single resolve_model_config() entry point is the right abstraction. The MODELS backward-compatibility alias is handled correctly.

The claude-4.6-opus regression caught by the earlier QA review has been fixed in this commit (9745166) — verified that both main and the PR head have litellm_proxy/anthropic/claude-opus-4-6.

One minor note left as a non-blocking inline comment.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW
Internal CI script (.github/run-eval/), not a public API. The backward-compatible MODELS alias preserves patch.dict test compatibility. The refactor is covered by 44 tests including representative family-derived IDs. Blast radius is limited to eval runs.

VERDICT:
✅ Worth merging — Core logic is sound, existing regression fixed, tests solid.

KEY INSIGHT:
Replacing a flat dict of explicit entries with regex-pattern family defaults is exactly the right data structure choice — it eliminates an entire class of mechanical PRs.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing. See the customization docs for the required frontmatter format.

Re-request a review — the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it is merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

The setup-matrix and run-eval workflows imported MODELS directly and checked membership against it, which excluded family-derived models (e.g. deepseek-v4-flash, glm-5.2) that are not in EXPLICIT_MODELS but resolve via family patterns. Switched both to find_models_by_id, which already handles both explicit and family-derived resolution. Co-authored-by: openhands <openhands@all-hands.dev>

Matches the safety of the family-derived path, which already builds a fresh llm_config dict. Prevents callers from mutating the global EXPLICIT_MODELS entry. Co-authored-by: openhands <openhands@all-hands.dev>

neubig · 2026-06-19T13:21:23Z

All review feedback has been addressed:

Fixed claude-4.6-opus proxy string regression (was accidentally changed to claude-4-6, restored to claude-opus-4-6 per review comment)
Deep-copied llm_config in the explicit-model path to match the family-derived path's safety (per review suggestion)
Updated integration-runner.yml and run-eval.yml to use find_models_by_id so family-derived models (e.g. deepseek-v4-flash, glm-5.2) resolve correctly in CI

CI is fully green (33 checks passing) and all review threads are resolved. Ready for merge.

This comment was posted by an AI agent (OpenHands) on behalf of the user.

all-hands-bot reviewed Jun 19, 2026

View reviewed changes

Comment thread .github/run-eval/resolve_model_config.py Outdated

fix: preserve claude-4.6-opus proxy string from main

9745166

Regression introduced during the refactor — the proxy string was accidentally changed from claude-opus-4-6 to claude-4-6. Restored to match main. Co-authored-by: openhands <openhands@all-hands.dev>

neubig added the review-this This label triggers a PR review by OpenHands label Jun 19, 2026

all-hands-bot approved these changes Jun 19, 2026

View reviewed changes

Comment thread .github/run-eval/resolve_model_config.py Outdated

openhands-agent added 2 commits June 19, 2026 09:04

fix: deep-copy llm_config in resolve_model_config explicit path

7e2454b

Matches the safety of the family-derived path, which already builds a fresh llm_config dict. Prevents callers from mutating the global EXPLICIT_MODELS entry. Co-authored-by: openhands <openhands@all-hands.dev>

neubig merged commit 740fe63 into main Jun 19, 2026
36 of 37 checks passed

neubig deleted the generalize-model-config branch June 19, 2026 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize model config with family-based resolution#3802

Generalize model config with family-based resolution#3802
neubig merged 4 commits into
mainfrom
generalize-model-config

neubig commented Jun 19, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

all-hands-bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

neubig commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

neubig commented Jun 19, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Issue Number

How to Test

Video/Screenshots

Type

Notes

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

⚠️ QA Report: PASS WITH ISSUES

Does this PR achieve its stated goal?

Test 1: New GLM version resolves without an explicit config entry

Test 2: Existing explicit model configs remain stable

Issues Found

Uh oh!

Uh oh!

all-hands-bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

neubig commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

neubig commented Jun 19, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading

all-hands-bot commented Jun 19, 2026 •

edited

Loading