Skip to content

fix(openai-backend): add json_object fallback for compatible providers#709

Open
liannnix wants to merge 1 commit into
plastic-labs:mainfrom
liannnix:fix/openai-compat-json-object-fallback
Open

fix(openai-backend): add json_object fallback for compatible providers#709
liannnix wants to merge 1 commit into
plastic-labs:mainfrom
liannnix:fix/openai-compat-json-object-fallback

Conversation

@liannnix
Copy link
Copy Markdown

@liannnix liannnix commented May 21, 2026

Problem

Many OpenAI-compatible providers (Z.AI GLM, Ollama, vLLM, Together, etc.) support {"type": "json_object"} but not the full native structured-output API — specifically chat.completions.parse() with a Pydantic model and {"type": "json_schema"}.

When Honcho's OpenAI backend calls parse() against these providers, it fails with ValidationError or LengthFinishReasonError. The current fallback path uses _create_structured_response which sends json_schema — these providers also ignore it and return plain text. repair_json cannot parse plain text, so the deriver produces zero observations.

This is especially problematic with reasoning models (e.g. Z.AI GLM-5-turbo) that spend tokens on reasoning_content, sometimes leaving content empty when max_tokens is insufficient.

Solution

Add a _json_object_fallback method to OpenAIBackend that:

  1. Uses {"type": "json_object"} instead of json_schema
  2. Injects a system message with the Pydantic model's JSON schema so the model returns the correct key names
  3. Parses the response with repair_response_model_json for robust handling of minor formatting issues

The fallback is triggered in three cases:

  • parse()LengthFinishReasonError with empty content
  • parse()BadRequestError / ValidationError / JSONDecodeError
  • parse() succeeds but parsed is None with no refusal

Testing

Self-hosted Honcho deployment with Z.AI GLM-5-turbo as the deriver LLM:

Metric Before After
Observations per batch 0 4–12
Queue processed (2 min) ~2 items ~150 items

Also verified with max_output_tokens=8000 to account for reasoning token overhead.

Impact

  • No behavior change for providers that support native structured output (OpenAI, Anthropic, Gemini) — the parse() path runs first, fallback only triggers on errors
  • Improves compatibility with OpenAI-compatible providers out of the box
  • No new dependencies — uses existing repair_response_model_json and json stdlib

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved reliability of structured output handling when using OpenAI-compatible API providers. Enhanced error recovery mechanisms to gracefully handle edge cases where responses fail validation.

Review Change Stack

Many OpenAI-compatible providers (Z.AI GLM, Ollama, vLLM, etc.)
support {"type": "json_object"} but not the full native
structured-output API (chat.completions.parse / json_schema).

Previously, when parse() failed, the fallback used json_schema
which these providers also ignore, resulting in plain-text responses
that repair_json could not parse — yielding zero observations
in the deriver pipeline.

This patch adds a _json_object_fallback method that:
- Uses {"type": "json_object"} instead of json_schema
- Injects a system message with the JSON schema so the model
  returns the correct structure
- Falls through to repair_response_model_json for robust parsing

Tested with Z.AI GLM-5-turbo as the deriver LLM on a self-hosted
Honcho deployment: observation count went from 0 to 12 per batch.

Written using opencode with GLM-5.1.

Co-authored-by: opencode <opencode@anomaly.co>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Walkthrough

OpenAIBackend now implements a JSON-object fallback mechanism for structured-output failures. When truncation lacks raw content or when structured responses return no parsed content, the backend invokes a new _json_object_fallback helper that forces basic JSON mode, injects schema hints via system message, and repairs or constructs the result.

Changes

JSON object fallback for structured output

Layer / File(s) Summary
Structured output error fallback integration
src/llm/backends/openai.py
Two error handlers in complete() now call _json_object_fallback() instead of prior fallback strategies: LengthFinishReasonError path switches to the fallback when raw content is missing, and structured-response validation returns the fallback when no parsed content and no refusal are present.
JSON object fallback implementation
src/llm/backends/openai.py
New private method _json_object_fallback() forces {"type": "json_object"} mode, prepends a system message containing the schema from model_json_schema(), performs a completion, repairs JSON from raw content if available or constructs a default instance, then normalizes and returns the result.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Poem

🐰 When JSON structures fail to parse,
A humble fallback saves the day—
Schema hints and repair flow,
Graceful degradation's way!
JSON lives to fight once more.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(openai-backend): add json_object fallback for compatible providers' accurately and concisely summarizes the main change: adding a JSON object fallback mechanism for OpenAI-compatible providers in the OpenAI backend.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/llm/backends/openai.py (3)

387-394: 💤 Low value

Redundant import json as _json; reuse the module-level json.

json is already imported at line 3, and there's no local name shadowing it here, so the in-function aliased import adds noise. Use the module-level import directly.

♻️ Proposed cleanup
-        fallback_params["response_format"] = {"type": "json_object"}
-
-        import json as _json
-
-        schema = response_format.model_json_schema()
-        schema_hint = (
-            "You MUST respond with ONLY a valid JSON object matching this exact schema: "
-            + _json.dumps(schema)
-            + ". Do not wrap in markdown code fences."
-        )
+        fallback_params["response_format"] = {"type": "json_object"}
+
+        schema = response_format.model_json_schema()
+        schema_hint = (
+            "You MUST respond with ONLY a valid JSON object matching this exact schema: "
+            + json.dumps(schema)
+            + ". Do not wrap in markdown code fences."
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 387 - 394, Remove the redundant
local import "import json as _json" and use the module-level "json" instead in
the block that builds schema and schema_hint (the code invoking
response_format.model_json_schema() and assigning schema_hint); update
schema_hint to concatenate _json.dumps(schema) -> json.dumps(schema) so it
references the existing top-level json import and eliminates the unnecessary
alias.

188-190: ⚡ Quick win

Consider logging when engaging the json_object fallback.

The fallback now silently swallows three failure modes (truncation without content, BadRequest/ValidationError/JSONDecodeError, and parsed is None without refusal). Without a log line at each entry point, operators won't be able to tell from telemetry whether a given deployment is on the structured-output happy path or the json_object recovery path, which makes it harder to spot a provider whose native structured output silently regressed. A single logger.warning(...) at each call site (or inside _json_object_fallback) noting the reason and model would be enough.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 188 - 190, Add a warning log
whenever the json-object recovery path is taken so operators can distinguish
structured-output vs fallback behavior: update calls to _json_object_fallback in
OpenAIBackend (e.g., where return await self._json_object_fallback(params,
response_format, model) is invoked) or add the log inside the
_json_object_fallback method itself to emit logger.warning(...) with the model
name and a concise reason (e.g., "truncation without content",
"BadRequest/ValidationError/JSONDecodeError", or "parsed is None without
refusal") before performing the fallback; ensure the message includes model and
the specific failure mode to aid telemetry.

395-397: ⚡ Quick win

Prepended system message may collide with an existing system prompt.

Some OpenAI-compatible providers (and certain models) only honor the first system message or merge them in surprising ways. Inserting a new system message at index 0 can push the caller's original system prompt out of effect on those backends, weakening the very prompt that shaped the structured-output request. Consider appending to (or merging with) an existing leading system message instead of unconditionally inserting a second one.

♻️ Suggested approach
-        msgs = list(fallback_params.get("messages", []))
-        msgs.insert(0, {"role": "system", "content": schema_hint})
-        fallback_params["messages"] = msgs
+        msgs = list(fallback_params.get("messages", []))
+        if msgs and msgs[0].get("role") == "system":
+            merged = dict(msgs[0])
+            merged["content"] = f"{merged.get('content', '')}\n\n{schema_hint}".strip()
+            msgs[0] = merged
+        else:
+            msgs.insert(0, {"role": "system", "content": schema_hint})
+        fallback_params["messages"] = msgs
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 395 - 397, The current code
unconditionally inserts a new system message at index 0 (using msgs and
schema_hint), which can override a caller-provided leading system prompt on some
OpenAI-compatible backends; change the logic to detect whether the first message
in fallback_params["messages"] already has role "system" and if so merge
schema_hint into that first message's "content" (e.g., append with a clear
separator) instead of inserting a second system message, otherwise create a new
system message as currently done—update the code paths that manipulate
msgs/fallback_params to perform this merge-or-insert behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm/backends/openai.py`:
- Around line 408-411: Calling response_format.model_construct() can produce an
empty/partial model that later raises AttributeError and masks a failed provider
response; instead, in the _normalize_response path where you currently pass
content_override=response_format.model_construct(), check whether the provider
returned any content first and if not either (a) construct a safe default model
via a known safe constructor on response_format (e.g.,
response_format.safe_default() if available) and log a warning, or (b) raise a
ValidationException (or a dedicated ResponseValidationError) so callers can
handle retry/skip; update the code around _normalize_response to perform this
validation, log a clear warning including response_format and provider result
when empty, and avoid calling model_construct() blindly.

---

Nitpick comments:
In `@src/llm/backends/openai.py`:
- Around line 387-394: Remove the redundant local import "import json as _json"
and use the module-level "json" instead in the block that builds schema and
schema_hint (the code invoking response_format.model_json_schema() and assigning
schema_hint); update schema_hint to concatenate _json.dumps(schema) ->
json.dumps(schema) so it references the existing top-level json import and
eliminates the unnecessary alias.
- Around line 188-190: Add a warning log whenever the json-object recovery path
is taken so operators can distinguish structured-output vs fallback behavior:
update calls to _json_object_fallback in OpenAIBackend (e.g., where return await
self._json_object_fallback(params, response_format, model) is invoked) or add
the log inside the _json_object_fallback method itself to emit
logger.warning(...) with the model name and a concise reason (e.g., "truncation
without content", "BadRequest/ValidationError/JSONDecodeError", or "parsed is
None without refusal") before performing the fallback; ensure the message
includes model and the specific failure mode to aid telemetry.
- Around line 395-397: The current code unconditionally inserts a new system
message at index 0 (using msgs and schema_hint), which can override a
caller-provided leading system prompt on some OpenAI-compatible backends; change
the logic to detect whether the first message in fallback_params["messages"]
already has role "system" and if so merge schema_hint into that first message's
"content" (e.g., append with a clear separator) instead of inserting a second
system message, otherwise create a new system message as currently done—update
the code paths that manipulate msgs/fallback_params to perform this
merge-or-insert behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f0eb9c77-d21c-40da-a95f-81a514025d67

📥 Commits

Reviewing files that changed from the base of the PR and between b5f24a6 and 078139b.

📒 Files selected for processing (1)
  • src/llm/backends/openai.py

Comment on lines +408 to +411
return self._normalize_response(
response,
content_override=response_format.model_construct(),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

model_construct() with no fields can hand downstream code an unusable instance.

response_format.model_construct() bypasses validation and does not populate required fields that lack defaults — accessing those attributes later raises AttributeError, and the deriver will end up treating an empty/partial model as a successful result (the same "zero observations" symptom this PR is trying to fix, just shifted one layer down).

If the provider returned no content here, the fallback has effectively failed; this is more honest as a raised ValidationException so the caller can decide whether to retry, log, or skip. At minimum, a warning log so this state is observable, plus an explicit safe-default constructor when one exists for the model.

🛡️ Suggested change
-        response = await self._client.chat.completions.create(**fallback_params)
-        raw_content = response.choices[0].message.content or ""
-        if raw_content:
-            content = repair_response_model_json(
-                raw_content,
-                response_format,
-                model,
-            )
-            return self._normalize_response(response, content_override=content)
-        return self._normalize_response(
-            response,
-            content_override=response_format.model_construct(),
-        )
+        response = await self._client.chat.completions.create(**fallback_params)
+        raw_content = response.choices[0].message.content or ""
+        if raw_content:
+            content = repair_response_model_json(
+                raw_content,
+                response_format,
+                model,
+            )
+            return self._normalize_response(response, content_override=content)
+        logger.warning(
+            "json_object fallback returned empty content for model=%s schema=%s",
+            model,
+            response_format.__name__,
+        )
+        raise ValidationException(
+            f"json_object fallback returned empty content for {response_format.__name__}"
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/openai.py` around lines 408 - 411, Calling
response_format.model_construct() can produce an empty/partial model that later
raises AttributeError and masks a failed provider response; instead, in the
_normalize_response path where you currently pass
content_override=response_format.model_construct(), check whether the provider
returned any content first and if not either (a) construct a safe default model
via a known safe constructor on response_format (e.g.,
response_format.safe_default() if available) and log a warning, or (b) raise a
ValidationException (or a dedicated ResponseValidationError) so callers can
handle retry/skip; update the code around _normalize_response to perform this
validation, log a clear warning including response_format and provider result
when empty, and avoid calling model_construct() blindly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant