Fix mlx_lm.server 404 on short prompts (clamp negative start in think-token search)#1327
Open
devYRPauli wants to merge 1 commit into
Open
Fix mlx_lm.server 404 on short prompts (clamp negative start in think-token search)#1327devYRPauli wants to merge 1 commit into
devYRPauli wants to merge 1 commit into
Conversation
…search mlx_lm.server computes start = len(prompt) - 11 and passes it to rfind_think_start; for prompts under 11 tokens start is negative, so _find indexes past the token list and raises IndexError (surfaced as HTTP 404) on common one-word messages like "hi". Clamp start to 0 in _find; add a regression test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1326.
mlx_lm.serverreturnsHTTP 404 {"error": "list index out of range"}for any chat message whose templated prompt is shorter than 11 tokens — common one-word messages like"hi"/"hello". OpenAI-compatible clients (Open WebUI, Chatbox, …) surface this as an unrecoverable error on the first short turn.Root cause
mlx_lm/server.pycomputesstart = len(prompt) - 11and passes it torfind_think_start, which callsTokenizerWrapper._find._finddidstart = start or 0, which only normalizesNone— a negativestartpasses through. The reverserange(end - len(sequence), start - 1, -1)then descends into negative indices; Python negative indexing wraps around the token list (false matches), andtokens[i]/tokens[i + j]eventually escape the bounds and raiseIndexError. The server converts that into the 404.Fix
Clamp
startto0in_find(start = max(start or 0, 0)). This is the central fix for all four think-search helpers (find/rfind_think_start/end) and is semantically correct — a prompt shorter than the window should be searched from the beginning.Test
Added
test_think_search_short_prompt_negative_start: passes the server's exactlen(prompt) - 11(negative) start torfind_think_start/find_think_startand asserts noIndexError(returns-1when no think-start is present). Fails before the fix, passes after.Verification
test_thinkingstill passes;_findhappy-path (positive start, forward and reverse) unchanged."hi"→ 9 tokens → the server's think-search (start = 9 - 11 = -2) now returns-1instead of raising.blackandisort --profile blackclean.