Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects by i1rr · Pull Request #1328 · ml-explore/mlx-lm

i1rr · 2026-05-30T02:28:21Z

Two related server-resilience fixes triggered by an agent flow against mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit via mlx_lm.server.

1. Qwen3-Coder tool parser — nested `<parameter=...>` substrings in values

Problem

_parameter_regex used <parameter=(.*?)</parameter> with non-greedy .*?. When a value contained text like:

<parameter=file_text>
def example():
    # see <parameter=path>...</parameter> in the docs
</parameter>

the regex stopped at the first inner </parameter>, splitting the value mid-content. The truncated tail was then no longer valid JSON or a valid Python literal, so _convert_param_value raised SyntaxError from ast.literal_eval. That SyntaxError is not a ValueError/JSONDecodeError, so the except in ToolCallFormatter.__call__ did not catch it. The exception escaped into socketserver.process_request_thread, producing a noisy traceback in the server log and a half-sent response on the client side.

Fix

Rewrite _parameter_regex with two capture groups and a trailing lookahead that anchors the closing </parameter> to one followed by the next <parameter=, the closing </function>, or end of input. Literal <parameter=...> / </parameter> substrings inside a value no longer terminate the match.
The manual match_text.index(">") split is gone — findall now returns (name, value) directly.
except (ValueError, json.JSONDecodeError) → except Exception in ToolCallFormatter. Future parser bugs degrade gracefully (warn + skip) instead of corrupting the response.

2. `handle_completion` — graceful abort on client disconnect

Problem

Long generations (large prompt cache + big model) can take many minutes. If the client times out and closes the socket mid-generation, the next self.wfile.write(...) raises BrokenPipeError (or another ConnectionError subclass). The exception bypasses the access-log layer and prints a long traceback. Worse, every retry from the same client repeated the cycle, and each retry kept the model busy generating tokens that nobody was going to read — monopolizing the generation lock and starving subsequent requests.

Fix

Add except ConnectionError (covers BrokenPipeError, ConnectionResetError, ConnectionAbortedError) right before the existing finally: ctx.stop(). The teardown then aborts the generation cleanly: ctx.stop() halts the token stream, the model becomes available, and a single concise INFO line replaces the traceback in the log.

Test plan

python -m unittest tests.test_tool_parsing — 6/6, including the new test_qwen3_coder_value_contains_parameter_tags.
python -m unittest tests.test_server — 20/20.
black --check clean on all touched files.
Manually verified against the original failing prompts: parser returns the correct value, server logs no exception when a client disconnects mid-stream, and the generation actually stops (no orphaned compute after disconnect).

The qwen3_coder parameter regex used a non-greedy `(.*?)</parameter>` match, which terminated at the first `</parameter>` substring it saw. When the model emitted code or markdown that itself contained `<parameter=...>` or `</parameter>` text inside a value (common with str_replace_editor-style tools), the value was truncated and the fragment was no longer parseable as JSON, Python literal, or even a string. The resulting `SyntaxError` from `ast.literal_eval` was also not caught by the server's tool-call handler (which only caught `ValueError`/`JSONDecodeError`), so it escaped as an unhandled exception in the request thread, leaving the client with a half-sent response and a noisy traceback in the server log. This change: - Rewrites `_parameter_regex` to anchor the closing `</parameter>` via a lookahead that requires it to be followed by the next parameter, the function end, or end of input. The regex now captures the name and value as separate groups, so the manual `index(">")` split is gone. - Broadens the tool-parser exception handler in `server.py` to catch any `Exception`, so future parser bugs degrade gracefully (warn and skip the call) instead of corrupting the response. - Adds a `test_qwen3_coder_value_contains_parameter_tags` test covering both string and object-typed values whose content includes literal `<parameter=...>` / `</parameter>` substrings.

Long generations (large prompt cache, big model) can take many minutes to complete. If the client times out and closes the socket while the server is mid-generation, the next `self.wfile.write(...)` raises `BrokenPipeError` (or another `ConnectionError` subclass). The exception propagates out of `handle_completion`, bypasses the access-log layer, and prints a long traceback in the server log. Worse, the same client behaviour repeated on every retry — and each retry kept the model busy generating tokens that nobody was going to read, monopolizing the generation lock and starving subsequent requests. Catching `ConnectionError` (which covers `BrokenPipeError`, `ConnectionResetError`, and `ConnectionAbortedError`) right before the existing `finally: ctx.stop()` lets the existing teardown abort the generation cleanly: `ctx.stop()` halts the token stream, the model becomes available, and a single concise INFO line replaces the traceback in the log. No new test is added — exercising real socket teardown against the live `ThreadingHTTPServer` would be a flaky integration test that would only verify the language-level semantics of `except`/`finally` we already rely on elsewhere in the file.

i1rr added 2 commits May 30, 2026 12:26

i1rr changed the title ~~Fix Qwen3-Coder tool parser on values containing <parameter=...> tags~~ Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects#1328

Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects#1328
i1rr wants to merge 2 commits into
ml-explore:mainfrom
i1rr:fix-qwen3-coder-nested-param-tags

i1rr commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

i1rr commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Qwen3-Coder tool parser — nested <parameter=...> substrings in values

Problem

Fix

2. handle_completion — graceful abort on client disconnect

Problem

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

i1rr commented May 30, 2026 •

edited

Loading

1. Qwen3-Coder tool parser — nested `<parameter=...>` substrings in values

2. `handle_completion` — graceful abort on client disconnect