chat : add content-only fallback for JSON_NATIVE tool parser#20800
chat : add content-only fallback for JSON_NATIVE tool parser#20800jpohhhh wants to merge 1 commit into
Conversation
For templates like Llama 3.3 where tool_start is "{" (no distinctive
marker), the content parser stops at any brace and the tools parser
takes over. If the model output contains braces that aren't valid tool
calls, the tools parser fails with nothing to absorb the remaining
input.
Regression introduced in 566059a (Autoparser ggml-org#18675, 2026-03-06).
Two failure modes on current master:
- Content silently truncated at first "{" (partial match)
- HTTP 500 crash (full parse throws)
Fix: wrap the existing parser in a choice() with a content-only
fallback. The tools path is tried first; when it fails, the fallback
returns everything as content. No behavior change for valid tool calls.
Unit test:
cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat
Server repro (Llama 3.2 3B, temp=0, tools enabled):
llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf --jinja
# 200 before 566059a, 500 after
curl http://localhost:8080/v1/chat/completions -d '{
"messages": [{"role": "user", "content": "Write a hello world C program. Just the code, no explanation."}],
"tools": [{"type": "function", "function": {
"name": "get_weather", "description": "Get weather",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}}],
"temperature": 0, "max_tokens": 200
}'
|
Fixed by #20289. |
The fix isn't working, it is optional, so the regression persists. Any response containing |
|
The tests are irrelevant since they don't force the pure content parser. |
Am I understanding correctly: This works as intended: all Llama 3.x requests with tools calls and responses containing { should return 500 unless A) the { expands into one of the tools B) the user passes an option. |
|
Spent 20 minutes thinking on it, realized that can't be the case, because:
I'll open a new PR with description based on that, per your comment re: unclear issue descriptions, I assume this falls into that bucket and thus that would be best practice. |
For templates like Llama 3.3 where tool_start is "{" (no distinctive marker), the content parser stops at any brace and the tools parser takes over. If the model output contains braces that aren't valid tool calls, the tools parser fails with nothing to absorb the remaining input.
Regression introduced in 566059a (Autoparser #18675, 2026-03-06).
Two failure modes on current master:
Fix: wrap the existing parser in a choice() with a content-only fallback. The tools path is tried first; when it fails, the fallback returns everything as content. No behavior change for valid tool calls.
Unit test:
cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat
Server repro (Llama 3.2 3B, temp=0, tools enabled):
llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf --jinja
200 before 566059a, 500 after
curl http://localhost:8080/v1/chat/completions -d '{
"messages": [{"role": "user", "content": "Write a hello world C program. Just the code, no explanation."}],
"tools": [{"type": "function", "function": {
"name": "get_weather", "description": "Get weather",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}}],
"temperature": 0, "max_tokens": 200
}'