Skip to content

Fix mlx_lm.server 404 on short prompts (clamp negative start in think-token search)#1327

Open
devYRPauli wants to merge 1 commit into
ml-explore:mainfrom
devYRPauli:fix/server-404-short-prompt
Open

Fix mlx_lm.server 404 on short prompts (clamp negative start in think-token search)#1327
devYRPauli wants to merge 1 commit into
ml-explore:mainfrom
devYRPauli:fix/server-404-short-prompt

Conversation

@devYRPauli
Copy link
Copy Markdown

Summary

Fixes #1326. mlx_lm.server returns HTTP 404 {"error": "list index out of range"} for any chat message whose templated prompt is shorter than 11 tokens — common one-word messages like "hi" / "hello". OpenAI-compatible clients (Open WebUI, Chatbox, …) surface this as an unrecoverable error on the first short turn.

Root cause

mlx_lm/server.py computes start = len(prompt) - 11 and passes it to rfind_think_start, which calls TokenizerWrapper._find. _find did start = start or 0, which only normalizes None — a negative start passes through. The reverse range(end - len(sequence), start - 1, -1) then descends into negative indices; Python negative indexing wraps around the token list (false matches), and tokens[i] / tokens[i + j] eventually escape the bounds and raise IndexError. The server converts that into the 404.

Fix

Clamp start to 0 in _find (start = max(start or 0, 0)). This is the central fix for all four think-search helpers (find/rfind_think_start/end) and is semantically correct — a prompt shorter than the window should be searched from the beginning.

Test

Added test_think_search_short_prompt_negative_start: passes the server's exact len(prompt) - 11 (negative) start to rfind_think_start / find_think_start and asserts no IndexError (returns -1 when no think-start is present). Fails before the fix, passes after.

Verification

  • New regression test: red before / green after.
  • Existing test_thinking still passes; _find happy-path (positive start, forward and reverse) unchanged.
  • End-to-end: with the real Qwen3 chat template, "hi" → 9 tokens → the server's think-search (start = 9 - 11 = -2) now returns -1 instead of raising.
  • black and isort --profile black clean.

…search

mlx_lm.server computes start = len(prompt) - 11 and passes it to
rfind_think_start; for prompts under 11 tokens start is negative, so _find
indexes past the token list and raises IndexError (surfaced as HTTP 404) on
common one-word messages like "hi". Clamp start to 0 in _find; add a
regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mlx_lm.server returns 404 "list index out of range" on short prompts (TokenizerWrapper._find doesn't clamp negative start)

1 participant