Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects#1328
Open
i1rr wants to merge 2 commits into
Open
Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects#1328i1rr wants to merge 2 commits into
i1rr wants to merge 2 commits into
Conversation
The qwen3_coder parameter regex used a non-greedy `(.*?)</parameter>`
match, which terminated at the first `</parameter>` substring it saw.
When the model emitted code or markdown that itself contained
`<parameter=...>` or `</parameter>` text inside a value (common with
str_replace_editor-style tools), the value was truncated and the
fragment was no longer parseable as JSON, Python literal, or even a
string. The resulting `SyntaxError` from `ast.literal_eval` was also
not caught by the server's tool-call handler (which only caught
`ValueError`/`JSONDecodeError`), so it escaped as an unhandled
exception in the request thread, leaving the client with a half-sent
response and a noisy traceback in the server log.
This change:
- Rewrites `_parameter_regex` to anchor the closing `</parameter>` via
a lookahead that requires it to be followed by the next parameter,
the function end, or end of input. The regex now captures the name
and value as separate groups, so the manual `index(">")` split is
gone.
- Broadens the tool-parser exception handler in `server.py` to catch
any `Exception`, so future parser bugs degrade gracefully (warn and
skip the call) instead of corrupting the response.
- Adds a `test_qwen3_coder_value_contains_parameter_tags` test
covering both string and object-typed values whose content includes
literal `<parameter=...>` / `</parameter>` substrings.
Long generations (large prompt cache, big model) can take many minutes to complete. If the client times out and closes the socket while the server is mid-generation, the next `self.wfile.write(...)` raises `BrokenPipeError` (or another `ConnectionError` subclass). The exception propagates out of `handle_completion`, bypasses the access-log layer, and prints a long traceback in the server log. Worse, the same client behaviour repeated on every retry — and each retry kept the model busy generating tokens that nobody was going to read, monopolizing the generation lock and starving subsequent requests. Catching `ConnectionError` (which covers `BrokenPipeError`, `ConnectionResetError`, and `ConnectionAbortedError`) right before the existing `finally: ctx.stop()` lets the existing teardown abort the generation cleanly: `ctx.stop()` halts the token stream, the model becomes available, and a single concise INFO line replaces the traceback in the log. No new test is added — exercising real socket teardown against the live `ThreadingHTTPServer` would be a flaky integration test that would only verify the language-level semantics of `except`/`finally` we already rely on elsewhere in the file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related server-resilience fixes triggered by an agent flow against
mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bitviamlx_lm.server.1. Qwen3-Coder tool parser — nested
<parameter=...>substrings in valuesProblem
_parameter_regexused<parameter=(.*?)</parameter>with non-greedy.*?. When a value contained text like:the regex stopped at the first inner
</parameter>, splitting the value mid-content. The truncated tail was then no longer valid JSON or a valid Python literal, so_convert_param_valueraisedSyntaxErrorfromast.literal_eval. ThatSyntaxErroris not aValueError/JSONDecodeError, so theexceptinToolCallFormatter.__call__did not catch it. The exception escaped intosocketserver.process_request_thread, producing a noisy traceback in the server log and a half-sent response on the client side.Fix
_parameter_regexwith two capture groups and a trailing lookahead that anchors the closing</parameter>to one followed by the next<parameter=, the closing</function>, or end of input. Literal<parameter=...>/</parameter>substrings inside a value no longer terminate the match.match_text.index(">")split is gone —findallnow returns(name, value)directly.except (ValueError, json.JSONDecodeError)→except ExceptioninToolCallFormatter. Future parser bugs degrade gracefully (warn + skip) instead of corrupting the response.2.
handle_completion— graceful abort on client disconnectProblem
Long generations (large prompt cache + big model) can take many minutes. If the client times out and closes the socket mid-generation, the next
self.wfile.write(...)raisesBrokenPipeError(or anotherConnectionErrorsubclass). The exception bypasses the access-log layer and prints a long traceback. Worse, every retry from the same client repeated the cycle, and each retry kept the model busy generating tokens that nobody was going to read — monopolizing the generation lock and starving subsequent requests.Fix
Add
except ConnectionError(coversBrokenPipeError,ConnectionResetError,ConnectionAbortedError) right before the existingfinally: ctx.stop(). The teardown then aborts the generation cleanly:ctx.stop()halts the token stream, the model becomes available, and a single concise INFO line replaces the traceback in the log.Test plan
python -m unittest tests.test_tool_parsing— 6/6, including the newtest_qwen3_coder_value_contains_parameter_tags.python -m unittest tests.test_server— 20/20.black --checkclean on all touched files.