fix(openai-backend): add json_object fallback for compatible providers#709
fix(openai-backend): add json_object fallback for compatible providers#709liannnix wants to merge 1 commit into
Conversation
Many OpenAI-compatible providers (Z.AI GLM, Ollama, vLLM, etc.)
support {"type": "json_object"} but not the full native
structured-output API (chat.completions.parse / json_schema).
Previously, when parse() failed, the fallback used json_schema
which these providers also ignore, resulting in plain-text responses
that repair_json could not parse — yielding zero observations
in the deriver pipeline.
This patch adds a _json_object_fallback method that:
- Uses {"type": "json_object"} instead of json_schema
- Injects a system message with the JSON schema so the model
returns the correct structure
- Falls through to repair_response_model_json for robust parsing
Tested with Z.AI GLM-5-turbo as the deriver LLM on a self-hosted
Honcho deployment: observation count went from 0 to 12 per batch.
Written using opencode with GLM-5.1.
Co-authored-by: opencode <opencode@anomaly.co>
WalkthroughOpenAIBackend now implements a JSON-object fallback mechanism for structured-output failures. When truncation lacks raw content or when structured responses return no parsed content, the backend invokes a new ChangesJSON object fallback for structured output
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/llm/backends/openai.py (3)
387-394: 💤 Low valueRedundant
import json as _json; reuse the module-leveljson.
jsonis already imported at line 3, and there's no local name shadowing it here, so the in-function aliased import adds noise. Use the module-level import directly.♻️ Proposed cleanup
- fallback_params["response_format"] = {"type": "json_object"} - - import json as _json - - schema = response_format.model_json_schema() - schema_hint = ( - "You MUST respond with ONLY a valid JSON object matching this exact schema: " - + _json.dumps(schema) - + ". Do not wrap in markdown code fences." - ) + fallback_params["response_format"] = {"type": "json_object"} + + schema = response_format.model_json_schema() + schema_hint = ( + "You MUST respond with ONLY a valid JSON object matching this exact schema: " + + json.dumps(schema) + + ". Do not wrap in markdown code fences." + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/backends/openai.py` around lines 387 - 394, Remove the redundant local import "import json as _json" and use the module-level "json" instead in the block that builds schema and schema_hint (the code invoking response_format.model_json_schema() and assigning schema_hint); update schema_hint to concatenate _json.dumps(schema) -> json.dumps(schema) so it references the existing top-level json import and eliminates the unnecessary alias.
188-190: ⚡ Quick winConsider logging when engaging the json_object fallback.
The fallback now silently swallows three failure modes (truncation without content, BadRequest/ValidationError/JSONDecodeError, and
parsed is Nonewithout refusal). Without a log line at each entry point, operators won't be able to tell from telemetry whether a given deployment is on the structured-output happy path or the json_object recovery path, which makes it harder to spot a provider whose native structured output silently regressed. A singlelogger.warning(...)at each call site (or inside_json_object_fallback) noting the reason and model would be enough.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/backends/openai.py` around lines 188 - 190, Add a warning log whenever the json-object recovery path is taken so operators can distinguish structured-output vs fallback behavior: update calls to _json_object_fallback in OpenAIBackend (e.g., where return await self._json_object_fallback(params, response_format, model) is invoked) or add the log inside the _json_object_fallback method itself to emit logger.warning(...) with the model name and a concise reason (e.g., "truncation without content", "BadRequest/ValidationError/JSONDecodeError", or "parsed is None without refusal") before performing the fallback; ensure the message includes model and the specific failure mode to aid telemetry.
395-397: ⚡ Quick winPrepended system message may collide with an existing system prompt.
Some OpenAI-compatible providers (and certain models) only honor the first system message or merge them in surprising ways. Inserting a new system message at index 0 can push the caller's original system prompt out of effect on those backends, weakening the very prompt that shaped the structured-output request. Consider appending to (or merging with) an existing leading system message instead of unconditionally inserting a second one.
♻️ Suggested approach
- msgs = list(fallback_params.get("messages", [])) - msgs.insert(0, {"role": "system", "content": schema_hint}) - fallback_params["messages"] = msgs + msgs = list(fallback_params.get("messages", [])) + if msgs and msgs[0].get("role") == "system": + merged = dict(msgs[0]) + merged["content"] = f"{merged.get('content', '')}\n\n{schema_hint}".strip() + msgs[0] = merged + else: + msgs.insert(0, {"role": "system", "content": schema_hint}) + fallback_params["messages"] = msgs🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/backends/openai.py` around lines 395 - 397, The current code unconditionally inserts a new system message at index 0 (using msgs and schema_hint), which can override a caller-provided leading system prompt on some OpenAI-compatible backends; change the logic to detect whether the first message in fallback_params["messages"] already has role "system" and if so merge schema_hint into that first message's "content" (e.g., append with a clear separator) instead of inserting a second system message, otherwise create a new system message as currently done—update the code paths that manipulate msgs/fallback_params to perform this merge-or-insert behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 408-411: Calling response_format.model_construct() can produce an
empty/partial model that later raises AttributeError and masks a failed provider
response; instead, in the _normalize_response path where you currently pass
content_override=response_format.model_construct(), check whether the provider
returned any content first and if not either (a) construct a safe default model
via a known safe constructor on response_format (e.g.,
response_format.safe_default() if available) and log a warning, or (b) raise a
ValidationException (or a dedicated ResponseValidationError) so callers can
handle retry/skip; update the code around _normalize_response to perform this
validation, log a clear warning including response_format and provider result
when empty, and avoid calling model_construct() blindly.
---
Nitpick comments:
In `@src/llm/backends/openai.py`:
- Around line 387-394: Remove the redundant local import "import json as _json"
and use the module-level "json" instead in the block that builds schema and
schema_hint (the code invoking response_format.model_json_schema() and assigning
schema_hint); update schema_hint to concatenate _json.dumps(schema) ->
json.dumps(schema) so it references the existing top-level json import and
eliminates the unnecessary alias.
- Around line 188-190: Add a warning log whenever the json-object recovery path
is taken so operators can distinguish structured-output vs fallback behavior:
update calls to _json_object_fallback in OpenAIBackend (e.g., where return await
self._json_object_fallback(params, response_format, model) is invoked) or add
the log inside the _json_object_fallback method itself to emit
logger.warning(...) with the model name and a concise reason (e.g., "truncation
without content", "BadRequest/ValidationError/JSONDecodeError", or "parsed is
None without refusal") before performing the fallback; ensure the message
includes model and the specific failure mode to aid telemetry.
- Around line 395-397: The current code unconditionally inserts a new system
message at index 0 (using msgs and schema_hint), which can override a
caller-provided leading system prompt on some OpenAI-compatible backends; change
the logic to detect whether the first message in fallback_params["messages"]
already has role "system" and if so merge schema_hint into that first message's
"content" (e.g., append with a clear separator) instead of inserting a second
system message, otherwise create a new system message as currently done—update
the code paths that manipulate msgs/fallback_params to perform this
merge-or-insert behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f0eb9c77-d21c-40da-a95f-81a514025d67
📒 Files selected for processing (1)
src/llm/backends/openai.py
| return self._normalize_response( | ||
| response, | ||
| content_override=response_format.model_construct(), | ||
| ) |
There was a problem hiding this comment.
model_construct() with no fields can hand downstream code an unusable instance.
response_format.model_construct() bypasses validation and does not populate required fields that lack defaults — accessing those attributes later raises AttributeError, and the deriver will end up treating an empty/partial model as a successful result (the same "zero observations" symptom this PR is trying to fix, just shifted one layer down).
If the provider returned no content here, the fallback has effectively failed; this is more honest as a raised ValidationException so the caller can decide whether to retry, log, or skip. At minimum, a warning log so this state is observable, plus an explicit safe-default constructor when one exists for the model.
🛡️ Suggested change
- response = await self._client.chat.completions.create(**fallback_params)
- raw_content = response.choices[0].message.content or ""
- if raw_content:
- content = repair_response_model_json(
- raw_content,
- response_format,
- model,
- )
- return self._normalize_response(response, content_override=content)
- return self._normalize_response(
- response,
- content_override=response_format.model_construct(),
- )
+ response = await self._client.chat.completions.create(**fallback_params)
+ raw_content = response.choices[0].message.content or ""
+ if raw_content:
+ content = repair_response_model_json(
+ raw_content,
+ response_format,
+ model,
+ )
+ return self._normalize_response(response, content_override=content)
+ logger.warning(
+ "json_object fallback returned empty content for model=%s schema=%s",
+ model,
+ response_format.__name__,
+ )
+ raise ValidationException(
+ f"json_object fallback returned empty content for {response_format.__name__}"
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llm/backends/openai.py` around lines 408 - 411, Calling
response_format.model_construct() can produce an empty/partial model that later
raises AttributeError and masks a failed provider response; instead, in the
_normalize_response path where you currently pass
content_override=response_format.model_construct(), check whether the provider
returned any content first and if not either (a) construct a safe default model
via a known safe constructor on response_format (e.g.,
response_format.safe_default() if available) and log a warning, or (b) raise a
ValidationException (or a dedicated ResponseValidationError) so callers can
handle retry/skip; update the code around _normalize_response to perform this
validation, log a clear warning including response_format and provider result
when empty, and avoid calling model_construct() blindly.
Problem
Many OpenAI-compatible providers (Z.AI GLM, Ollama, vLLM, Together, etc.) support
{"type": "json_object"}but not the full native structured-output API — specificallychat.completions.parse()with a Pydantic model and{"type": "json_schema"}.When Honcho's OpenAI backend calls
parse()against these providers, it fails withValidationErrororLengthFinishReasonError. The current fallback path uses_create_structured_responsewhich sendsjson_schema— these providers also ignore it and return plain text.repair_jsoncannot parse plain text, so the deriver produces zero observations.This is especially problematic with reasoning models (e.g. Z.AI GLM-5-turbo) that spend tokens on
reasoning_content, sometimes leavingcontentempty whenmax_tokensis insufficient.Solution
Add a
_json_object_fallbackmethod toOpenAIBackendthat:{"type": "json_object"}instead ofjson_schemarepair_response_model_jsonfor robust handling of minor formatting issuesThe fallback is triggered in three cases:
parse()→LengthFinishReasonErrorwith emptycontentparse()→BadRequestError/ValidationError/JSONDecodeErrorparse()succeeds butparsed is Nonewith no refusalTesting
Self-hosted Honcho deployment with Z.AI GLM-5-turbo as the deriver LLM:
Also verified with
max_output_tokens=8000to account for reasoning token overhead.Impact
parse()path runs first, fallback only triggers on errorsrepair_response_model_jsonandjsonstdlibSummary by CodeRabbit
Release Notes