Skip to content

chat : add content-only fallback for JSON_NATIVE tool parser#20800

Closed
jpohhhh wants to merge 1 commit into
ggml-org:masterfrom
jpohhhh:fix-json-native-content-fallback
Closed

chat : add content-only fallback for JSON_NATIVE tool parser#20800
jpohhhh wants to merge 1 commit into
ggml-org:masterfrom
jpohhhh:fix-json-native-content-fallback

Conversation

@jpohhhh
Copy link
Copy Markdown
Contributor

@jpohhhh jpohhhh commented Mar 20, 2026

For templates like Llama 3.3 where tool_start is "{" (no distinctive marker), the content parser stops at any brace and the tools parser takes over. If the model output contains braces that aren't valid tool calls, the tools parser fails with nothing to absorb the remaining input.

Regression introduced in 566059a (Autoparser #18675, 2026-03-06).

Two failure modes on current master:

Fix: wrap the existing parser in a choice() with a content-only fallback. The tools path is tried first; when it fails, the fallback returns everything as content. No behavior change for valid tool calls.

Unit test:

cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat

Server repro (Llama 3.2 3B, temp=0, tools enabled):

llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf --jinja

200 before 566059a, 500 after

curl http://localhost:8080/v1/chat/completions -d '{
"messages": [{"role": "user", "content": "Write a hello world C program. Just the code, no explanation."}],
"tools": [{"type": "function", "function": {
"name": "get_weather", "description": "Get weather",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}}],
"temperature": 0, "max_tokens": 200
}'

For templates like Llama 3.3 where tool_start is "{" (no distinctive
marker), the content parser stops at any brace and the tools parser
takes over. If the model output contains braces that aren't valid tool
calls, the tools parser fails with nothing to absorb the remaining
input.

Regression introduced in 566059a (Autoparser ggml-org#18675, 2026-03-06).

Two failure modes on current master:
  - Content silently truncated at first "{" (partial match)
  - HTTP 500 crash (full parse throws)

Fix: wrap the existing parser in a choice() with a content-only
fallback. The tools path is tried first; when it fails, the fallback
returns everything as content. No behavior change for valid tool calls.

Unit test:

  cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
  cmake --build build --target test-chat
  ./build/bin/test-chat

Server repro (Llama 3.2 3B, temp=0, tools enabled):

  llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf --jinja

  # 200 before 566059a, 500 after
  curl http://localhost:8080/v1/chat/completions -d '{
    "messages": [{"role": "user", "content": "Write a hello world C program. Just the code, no explanation."}],
    "tools": [{"type": "function", "function": {
      "name": "get_weather", "description": "Get weather",
      "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
    }}],
    "temperature": 0, "max_tokens": 200
  }'
@jpohhhh jpohhhh requested review from a team and pwilkin as code owners March 20, 2026 14:23
@github-actions github-actions Bot added the testing Everything test related label Mar 20, 2026
@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Mar 20, 2026

Fixed by #20289.

@pwilkin pwilkin closed this Mar 20, 2026
@jpohhhh
Copy link
Copy Markdown
Contributor Author

jpohhhh commented Mar 20, 2026

Fixed by #20289.

The fix isn't working, it is optional, so the regression persists. Any response containing { causes crash with Llama 3.x. please run tests in PR, or in description.

@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Mar 20, 2026

The tests are irrelevant since they don't force the pure content parser.

@jpohhhh
Copy link
Copy Markdown
Contributor Author

jpohhhh commented Mar 20, 2026

The tests are irrelevant since they don't force the pure content parser.

Am I understanding correctly:

This works as intended: all Llama 3.x requests with tools calls and responses containing { should return 500 unless A) the { expands into one of the tools B) the user passes an option.

@jpohhhh
Copy link
Copy Markdown
Contributor Author

jpohhhh commented Mar 20, 2026

Spent 20 minutes thinking on it, realized that can't be the case, because:

  • the argument can only work with the server binary, not API
  • disabling tool calls altogether works around the issue, but doesn't resolve it - i.e. user shouldn't have to disable tool calling altogether to be allowed text responses with {

I'll open a new PR with description based on that, per your comment re: unclear issue descriptions, I assume this falls into that bucket and thus that would be best practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants