chat : add content-only fallback for JSON_NATIVE tool parser by jpohhhh · Pull Request #20800 · ggml-org/llama.cpp

jpohhhh · 2026-03-20T14:23:49Z

For templates like Llama 3.3 where tool_start is "{" (no distinctive marker), the content parser stops at any brace and the tools parser takes over. If the model output contains braces that aren't valid tool calls, the tools parser fails with nothing to absorb the remaining input.

Regression introduced in 566059a (Autoparser #18675, 2026-03-06).

Two failure modes on current master:

Content silently truncated at first "{" (partial match)
HTTP 500 crash (full parse throws, per server: workaround new chat parser regression #20729 demo of new chat parser core regression)

Fix: wrap the existing parser in a choice() with a content-only fallback. The tools path is tried first; when it fails, the fallback returns everything as content. No behavior change for valid tool calls.

Unit test:

cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat

Server repro (Llama 3.2 3B, temp=0, tools enabled):

llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf --jinja

200 before `566059a`, 500 after

curl http://localhost:8080/v1/chat/completions -d '{
"messages": [{"role": "user", "content": "Write a hello world C program. Just the code, no explanation."}],
"tools": [{"type": "function", "function": {
"name": "get_weather", "description": "Get weather",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}}],
"temperature": 0, "max_tokens": 200
}'

For templates like Llama 3.3 where tool_start is "{" (no distinctive marker), the content parser stops at any brace and the tools parser takes over. If the model output contains braces that aren't valid tool calls, the tools parser fails with nothing to absorb the remaining input. Regression introduced in 566059a (Autoparser ggml-org#18675, 2026-03-06). Two failure modes on current master: - Content silently truncated at first "{" (partial match) - HTTP 500 crash (full parse throws) Fix: wrap the existing parser in a choice() with a content-only fallback. The tools path is tried first; when it fails, the fallback returns everything as content. No behavior change for valid tool calls. Unit test: cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF cmake --build build --target test-chat ./build/bin/test-chat Server repro (Llama 3.2 3B, temp=0, tools enabled): llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf --jinja # 200 before 566059a, 500 after curl http://localhost:8080/v1/chat/completions -d '{ "messages": [{"role": "user", "content": "Write a hello world C program. Just the code, no explanation."}], "tools": [{"type": "function", "function": { "name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]} }}], "temperature": 0, "max_tokens": 200 }'

pwilkin · 2026-03-20T14:55:12Z

Fixed by #20289.

jpohhhh · 2026-03-20T14:57:18Z

Fixed by #20289.

The fix isn't working, it is optional, so the regression persists. Any response containing { causes crash with Llama 3.x. please run tests in PR, or in description.

pwilkin · 2026-03-20T14:58:08Z

The tests are irrelevant since they don't force the pure content parser.

jpohhhh · 2026-03-20T15:00:21Z

The tests are irrelevant since they don't force the pure content parser.

Am I understanding correctly:

This works as intended: all Llama 3.x requests with tools calls and responses containing { should return 500 unless A) the { expands into one of the tools B) the user passes an option.

jpohhhh · 2026-03-20T15:23:58Z

Spent 20 minutes thinking on it, realized that can't be the case, because:

the argument can only work with the server binary, not API
disabling tool calls altogether works around the issue, but doesn't resolve it - i.e. user shouldn't have to disable tool calling altogether to be allowed text responses with {

I'll open a new PR with description based on that, per your comment re: unclear issue descriptions, I assume this falls into that bucket and thus that would be best practice.

jpohhhh requested review from a team and pwilkin as code owners March 20, 2026 14:23

github-actions Bot added the testing Everything test related label Mar 20, 2026

pwilkin closed this Mar 20, 2026

jpohhhh mentioned this pull request Mar 20, 2026

chat : fix Llama 3.x throwing runtime exception if response contains { #20806

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat : add content-only fallback for JSON_NATIVE tool parser#20800

chat : add content-only fallback for JSON_NATIVE tool parser#20800
jpohhhh wants to merge 1 commit into
ggml-org:masterfrom
jpohhhh:fix-json-native-content-fallback

jpohhhh commented Mar 20, 2026

Uh oh!

pwilkin commented Mar 20, 2026

Uh oh!

jpohhhh commented Mar 20, 2026 •

edited

Loading

Uh oh!

pwilkin commented Mar 20, 2026

Uh oh!

jpohhhh commented Mar 20, 2026 •

edited

Loading

Uh oh!

jpohhhh commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jpohhhh commented Mar 20, 2026

200 before 566059a, 500 after

Uh oh!

pwilkin commented Mar 20, 2026

Uh oh!

jpohhhh commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Mar 20, 2026

Uh oh!

jpohhhh commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpohhhh commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

200 before `566059a`, 500 after

jpohhhh commented Mar 20, 2026 •

edited

Loading

jpohhhh commented Mar 20, 2026 •

edited

Loading