fix(openai-backend): add json_object fallback for compatible providers by liannnix · Pull Request #709 · plastic-labs/honcho

liannnix · 2026-05-21T13:41:11Z

Problem

Many OpenAI-compatible providers (Z.AI GLM, Ollama, vLLM, Together, etc.) support {"type": "json_object"} but not the full native structured-output API — specifically chat.completions.parse() with a Pydantic model and {"type": "json_schema"}.

When Honcho's OpenAI backend calls parse() against these providers, it fails with ValidationError or LengthFinishReasonError. The current fallback path uses _create_structured_response which sends json_schema — these providers also ignore it and return plain text. repair_json cannot parse plain text, so the deriver produces zero observations.

This is especially problematic with reasoning models (e.g. Z.AI GLM-5-turbo) that spend tokens on reasoning_content, sometimes leaving content empty when max_tokens is insufficient.

Solution

Add a _json_object_fallback method to OpenAIBackend that:

Uses {"type": "json_object"} instead of json_schema
Injects a system message with the Pydantic model's JSON schema so the model returns the correct key names
Parses the response with repair_response_model_json for robust handling of minor formatting issues

The fallback is triggered in three cases:

parse() → LengthFinishReasonError with empty content
parse() → BadRequestError / ValidationError / JSONDecodeError
parse() succeeds but parsed is None with no refusal

Testing

Self-hosted Honcho deployment with Z.AI GLM-5-turbo as the deriver LLM:

Metric	Before	After
Observations per batch	0	4–12
Queue processed (2 min)	~2 items	~150 items

Also verified with max_output_tokens=8000 to account for reasoning token overhead.

Impact

No behavior change for providers that support native structured output (OpenAI, Anthropic, Gemini) — the parse() path runs first, fallback only triggers on errors
Improves compatibility with OpenAI-compatible providers out of the box
No new dependencies — uses existing repair_response_model_json and json stdlib

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved reliability of structured output handling when using OpenAI-compatible API providers. Enhanced error recovery mechanisms to gracefully handle edge cases where responses fail validation.

Many OpenAI-compatible providers (Z.AI GLM, Ollama, vLLM, etc.) support {"type": "json_object"} but not the full native structured-output API (chat.completions.parse / json_schema). Previously, when parse() failed, the fallback used json_schema which these providers also ignore, resulting in plain-text responses that repair_json could not parse — yielding zero observations in the deriver pipeline. This patch adds a _json_object_fallback method that: - Uses {"type": "json_object"} instead of json_schema - Injects a system message with the JSON schema so the model returns the correct structure - Falls through to repair_response_model_json for robust parsing Tested with Z.AI GLM-5-turbo as the deriver LLM on a self-hosted Honcho deployment: observation count went from 0 to 12 per batch. Written using opencode with GLM-5.1. Co-authored-by: opencode <opencode@anomaly.co>

coderabbitai · 2026-05-21T13:44:31Z

Walkthrough

OpenAIBackend now implements a JSON-object fallback mechanism for structured-output failures. When truncation lacks raw content or when structured responses return no parsed content, the backend invokes a new _json_object_fallback helper that forces basic JSON mode, injects schema hints via system message, and repairs or constructs the result.

Changes

JSON object fallback for structured output

Layer / File(s)	Summary
Structured output error fallback integration `src/llm/backends/openai.py`	Two error handlers in `complete()` now call `_json_object_fallback()` instead of prior fallback strategies: `LengthFinishReasonError` path switches to the fallback when raw content is missing, and structured-response validation returns the fallback when no `parsed` content and no refusal are present.
JSON object fallback implementation `src/llm/backends/openai.py`	New private method `_json_object_fallback()` forces `{"type": "json_object"}` mode, prepends a system message containing the schema from `model_json_schema()`, performs a completion, repairs JSON from raw content if available or constructs a default instance, then normalizes and returns the result.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

[Bug] OpenAI-compatible chat responses can crash backend when choices or usage are missing #676: The new JSON-object fallback directly addresses concerns about brittle structured-output error handling and implements the suggested fallback chain for missing choices, usage, or parsed content.

Poem

🐰 When JSON structures fail to parse,
A humble fallback saves the day—
Schema hints and repair flow,
Graceful degradation's way!
JSON lives to fight once more.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(openai-backend): add json_object fallback for compatible providers' accurately and concisely summarizes the main change: adding a JSON object fallback mechanism for OpenAI-compatible providers in the OpenAI backend.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

src/llm/backends/openai.py (3)

387-394: 💤 Low value

Redundant import json as _json; reuse the module-level json.

json is already imported at line 3, and there's no local name shadowing it here, so the in-function aliased import adds noise. Use the module-level import directly.

♻️ Proposed cleanup

-        fallback_params["response_format"] = {"type": "json_object"}
-
-        import json as _json
-
-        schema = response_format.model_json_schema()
-        schema_hint = (
-            "You MUST respond with ONLY a valid JSON object matching this exact schema: "
-            + _json.dumps(schema)
-            + ". Do not wrap in markdown code fences."
-        )
+        fallback_params["response_format"] = {"type": "json_object"}
+
+        schema = response_format.model_json_schema()
+        schema_hint = (
+            "You MUST respond with ONLY a valid JSON object matching this exact schema: "
+            + json.dumps(schema)
+            + ". Do not wrap in markdown code fences."
+        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 387 - 394, Remove the redundant
local import "import json as _json" and use the module-level "json" instead in
the block that builds schema and schema_hint (the code invoking
response_format.model_json_schema() and assigning schema_hint); update
schema_hint to concatenate _json.dumps(schema) -> json.dumps(schema) so it
references the existing top-level json import and eliminates the unnecessary
alias.

188-190: ⚡ Quick win

Consider logging when engaging the json_object fallback.

The fallback now silently swallows three failure modes (truncation without content, BadRequest/ValidationError/JSONDecodeError, and parsed is None without refusal). Without a log line at each entry point, operators won't be able to tell from telemetry whether a given deployment is on the structured-output happy path or the json_object recovery path, which makes it harder to spot a provider whose native structured output silently regressed. A single logger.warning(...) at each call site (or inside _json_object_fallback) noting the reason and model would be enough.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 188 - 190, Add a warning log
whenever the json-object recovery path is taken so operators can distinguish
structured-output vs fallback behavior: update calls to _json_object_fallback in
OpenAIBackend (e.g., where return await self._json_object_fallback(params,
response_format, model) is invoked) or add the log inside the
_json_object_fallback method itself to emit logger.warning(...) with the model
name and a concise reason (e.g., "truncation without content",
"BadRequest/ValidationError/JSONDecodeError", or "parsed is None without
refusal") before performing the fallback; ensure the message includes model and
the specific failure mode to aid telemetry.

395-397: ⚡ Quick win

Prepended system message may collide with an existing system prompt.

Some OpenAI-compatible providers (and certain models) only honor the first system message or merge them in surprising ways. Inserting a new system message at index 0 can push the caller's original system prompt out of effect on those backends, weakening the very prompt that shaped the structured-output request. Consider appending to (or merging with) an existing leading system message instead of unconditionally inserting a second one.

♻️ Suggested approach

-        msgs = list(fallback_params.get("messages", []))
-        msgs.insert(0, {"role": "system", "content": schema_hint})
-        fallback_params["messages"] = msgs
+        msgs = list(fallback_params.get("messages", []))
+        if msgs and msgs[0].get("role") == "system":
+            merged = dict(msgs[0])
+            merged["content"] = f"{merged.get('content', '')}\n\n{schema_hint}".strip()
+            msgs[0] = merged
+        else:
+            msgs.insert(0, {"role": "system", "content": schema_hint})
+        fallback_params["messages"] = msgs

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 395 - 397, The current code
unconditionally inserts a new system message at index 0 (using msgs and
schema_hint), which can override a caller-provided leading system prompt on some
OpenAI-compatible backends; change the logic to detect whether the first message
in fallback_params["messages"] already has role "system" and if so merge
schema_hint into that first message's "content" (e.g., append with a clear
separator) instead of inserting a second system message, otherwise create a new
system message as currently done—update the code paths that manipulate
msgs/fallback_params to perform this merge-or-insert behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 408-411: Calling response_format.model_construct() can produce an
empty/partial model that later raises AttributeError and masks a failed provider
response; instead, in the _normalize_response path where you currently pass
content_override=response_format.model_construct(), check whether the provider
returned any content first and if not either (a) construct a safe default model
via a known safe constructor on response_format (e.g.,
response_format.safe_default() if available) and log a warning, or (b) raise a
ValidationException (or a dedicated ResponseValidationError) so callers can
handle retry/skip; update the code around _normalize_response to perform this
validation, log a clear warning including response_format and provider result
when empty, and avoid calling model_construct() blindly.

---

Nitpick comments:
In `@src/llm/backends/openai.py`:
- Around line 387-394: Remove the redundant local import "import json as _json"
and use the module-level "json" instead in the block that builds schema and
schema_hint (the code invoking response_format.model_json_schema() and assigning
schema_hint); update schema_hint to concatenate _json.dumps(schema) ->
json.dumps(schema) so it references the existing top-level json import and
eliminates the unnecessary alias.
- Around line 188-190: Add a warning log whenever the json-object recovery path
is taken so operators can distinguish structured-output vs fallback behavior:
update calls to _json_object_fallback in OpenAIBackend (e.g., where return await
self._json_object_fallback(params, response_format, model) is invoked) or add
the log inside the _json_object_fallback method itself to emit
logger.warning(...) with the model name and a concise reason (e.g., "truncation
without content", "BadRequest/ValidationError/JSONDecodeError", or "parsed is
None without refusal") before performing the fallback; ensure the message
includes model and the specific failure mode to aid telemetry.
- Around line 395-397: The current code unconditionally inserts a new system
message at index 0 (using msgs and schema_hint), which can override a
caller-provided leading system prompt on some OpenAI-compatible backends; change
the logic to detect whether the first message in fallback_params["messages"]
already has role "system" and if so merge schema_hint into that first message's
"content" (e.g., append with a clear separator) instead of inserting a second
system message, otherwise create a new system message as currently done—update
the code paths that manipulate msgs/fallback_params to perform this
merge-or-insert behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f0eb9c77-d21c-40da-a95f-81a514025d67

📥 Commits

Reviewing files that changed from the base of the PR and between b5f24a6 and 078139b.

📒 Files selected for processing (1)

src/llm/backends/openai.py

coderabbitai · 2026-05-21T13:47:22Z

+        return self._normalize_response(
+            response,
+            content_override=response_format.model_construct(),
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

model_construct() with no fields can hand downstream code an unusable instance.

response_format.model_construct() bypasses validation and does not populate required fields that lack defaults — accessing those attributes later raises AttributeError, and the deriver will end up treating an empty/partial model as a successful result (the same "zero observations" symptom this PR is trying to fix, just shifted one layer down).

If the provider returned no content here, the fallback has effectively failed; this is more honest as a raised ValidationException so the caller can decide whether to retry, log, or skip. At minimum, a warning log so this state is observable, plus an explicit safe-default constructor when one exists for the model.

🛡️ Suggested change

- response = await self._client.chat.completions.create(**fallback_params) - raw_content = response.choices[0].message.content or "" - if raw_content: - content = repair_response_model_json( - raw_content, - response_format, - model, - ) - return self._normalize_response(response, content_override=content) - return self._normalize_response( - response, - content_override=response_format.model_construct(), - ) + response = await self._client.chat.completions.create(**fallback_params) + raw_content = response.choices[0].message.content or "" + if raw_content: + content = repair_response_model_json( + raw_content, + response_format, + model, + ) + return self._normalize_response(response, content_override=content) + logger.warning( + "json_object fallback returned empty content for model=%s schema=%s", + model, + response_format.__name__, + ) + raise ValidationException( + f"json_object fallback returned empty content for {response_format.__name__}" + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/backends/openai.py` around lines 408 - 411, Calling response_format.model_construct() can produce an empty/partial model that later raises AttributeError and masks a failed provider response; instead, in the _normalize_response path where you currently pass content_override=response_format.model_construct(), check whether the provider returned any content first and if not either (a) construct a safe default model via a known safe constructor on response_format (e.g., response_format.safe_default() if available) and log a warning, or (b) raise a ValidationException (or a dedicated ResponseValidationError) so callers can handle retry/skip; update the code around _normalize_response to perform this validation, log a clear warning including response_format and provider result when empty, and avoid calling model_construct() blindly.

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai-backend): add json_object fallback for compatible providers#709

fix(openai-backend): add json_object fallback for compatible providers#709
liannnix wants to merge 1 commit into
plastic-labs:mainfrom
liannnix:fix/openai-compat-json-object-fallback

liannnix commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liannnix commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Impact

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

liannnix commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading