Skip to content

Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects#1328

Open
i1rr wants to merge 2 commits into
ml-explore:mainfrom
i1rr:fix-qwen3-coder-nested-param-tags
Open

Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects#1328
i1rr wants to merge 2 commits into
ml-explore:mainfrom
i1rr:fix-qwen3-coder-nested-param-tags

Conversation

@i1rr
Copy link
Copy Markdown

@i1rr i1rr commented May 30, 2026

Two related server-resilience fixes triggered by an agent flow against mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit via mlx_lm.server.

1. Qwen3-Coder tool parser — nested <parameter=...> substrings in values

Problem

_parameter_regex used <parameter=(.*?)</parameter> with non-greedy .*?. When a value contained text like:

<parameter=file_text>
def example():
    # see <parameter=path>...</parameter> in the docs
</parameter>

the regex stopped at the first inner </parameter>, splitting the value mid-content. The truncated tail was then no longer valid JSON or a valid Python literal, so _convert_param_value raised SyntaxError from ast.literal_eval. That SyntaxError is not a ValueError/JSONDecodeError, so the except in ToolCallFormatter.__call__ did not catch it. The exception escaped into socketserver.process_request_thread, producing a noisy traceback in the server log and a half-sent response on the client side.

Fix

  • Rewrite _parameter_regex with two capture groups and a trailing lookahead that anchors the closing </parameter> to one followed by the next <parameter=, the closing </function>, or end of input. Literal <parameter=...> / </parameter> substrings inside a value no longer terminate the match.
  • The manual match_text.index(">") split is gone — findall now returns (name, value) directly.
  • except (ValueError, json.JSONDecodeError)except Exception in ToolCallFormatter. Future parser bugs degrade gracefully (warn + skip) instead of corrupting the response.

2. handle_completion — graceful abort on client disconnect

Problem

Long generations (large prompt cache + big model) can take many minutes. If the client times out and closes the socket mid-generation, the next self.wfile.write(...) raises BrokenPipeError (or another ConnectionError subclass). The exception bypasses the access-log layer and prints a long traceback. Worse, every retry from the same client repeated the cycle, and each retry kept the model busy generating tokens that nobody was going to read — monopolizing the generation lock and starving subsequent requests.

Fix

Add except ConnectionError (covers BrokenPipeError, ConnectionResetError, ConnectionAbortedError) right before the existing finally: ctx.stop(). The teardown then aborts the generation cleanly: ctx.stop() halts the token stream, the model becomes available, and a single concise INFO line replaces the traceback in the log.

Test plan

  • python -m unittest tests.test_tool_parsing — 6/6, including the new test_qwen3_coder_value_contains_parameter_tags.
  • python -m unittest tests.test_server — 20/20.
  • black --check clean on all touched files.
  • Manually verified against the original failing prompts: parser returns the correct value, server logs no exception when a client disconnects mid-stream, and the generation actually stops (no orphaned compute after disconnect).

i1rr added 2 commits May 30, 2026 12:26
The qwen3_coder parameter regex used a non-greedy `(.*?)</parameter>`
match, which terminated at the first `</parameter>` substring it saw.
When the model emitted code or markdown that itself contained
`<parameter=...>` or `</parameter>` text inside a value (common with
str_replace_editor-style tools), the value was truncated and the
fragment was no longer parseable as JSON, Python literal, or even a
string. The resulting `SyntaxError` from `ast.literal_eval` was also
not caught by the server's tool-call handler (which only caught
`ValueError`/`JSONDecodeError`), so it escaped as an unhandled
exception in the request thread, leaving the client with a half-sent
response and a noisy traceback in the server log.

This change:
- Rewrites `_parameter_regex` to anchor the closing `</parameter>` via
  a lookahead that requires it to be followed by the next parameter,
  the function end, or end of input. The regex now captures the name
  and value as separate groups, so the manual `index(">")` split is
  gone.
- Broadens the tool-parser exception handler in `server.py` to catch
  any `Exception`, so future parser bugs degrade gracefully (warn and
  skip the call) instead of corrupting the response.
- Adds a `test_qwen3_coder_value_contains_parameter_tags` test
  covering both string and object-typed values whose content includes
  literal `<parameter=...>` / `</parameter>` substrings.
Long generations (large prompt cache, big model) can take many minutes
to complete. If the client times out and closes the socket while the
server is mid-generation, the next `self.wfile.write(...)` raises
`BrokenPipeError` (or another `ConnectionError` subclass). The exception
propagates out of `handle_completion`, bypasses the access-log layer,
and prints a long traceback in the server log. Worse, the same client
behaviour repeated on every retry — and each retry kept the model busy
generating tokens that nobody was going to read, monopolizing the
generation lock and starving subsequent requests.

Catching `ConnectionError` (which covers `BrokenPipeError`,
`ConnectionResetError`, and `ConnectionAbortedError`) right before the
existing `finally: ctx.stop()` lets the existing teardown abort the
generation cleanly: `ctx.stop()` halts the token stream, the model
becomes available, and a single concise INFO line replaces the
traceback in the log.

No new test is added — exercising real socket teardown against the live
`ThreadingHTTPServer` would be a flaky integration test that would only
verify the language-level semantics of `except`/`finally` we already
rely on elsewhere in the file.
@i1rr i1rr changed the title Fix Qwen3-Coder tool parser on values containing <parameter=...> tags Fix Qwen3-Coder tool parser and harden server against mid-stream client disconnects May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant