Fix mlx_lm.server 404 on short prompts (clamp negative start in think-token search) by devYRPauli · Pull Request #1327 · ml-explore/mlx-lm

devYRPauli · 2026-05-29T15:09:36Z

Summary

Fixes #1326. mlx_lm.server returns HTTP 404 {"error": "list index out of range"} for any chat message whose templated prompt is shorter than 11 tokens — common one-word messages like "hi" / "hello". OpenAI-compatible clients (Open WebUI, Chatbox, …) surface this as an unrecoverable error on the first short turn.

Root cause

mlx_lm/server.py computes start = len(prompt) - 11 and passes it to rfind_think_start, which calls TokenizerWrapper._find. _find did start = start or 0, which only normalizes None — a negative start passes through. The reverse range(end - len(sequence), start - 1, -1) then descends into negative indices; Python negative indexing wraps around the token list (false matches), and tokens[i] / tokens[i + j] eventually escape the bounds and raise IndexError. The server converts that into the 404.

Fix

Clamp start to 0 in _find (start = max(start or 0, 0)). This is the central fix for all four think-search helpers (find/rfind_think_start/end) and is semantically correct — a prompt shorter than the window should be searched from the beginning.

Test

Added test_think_search_short_prompt_negative_start: passes the server's exact len(prompt) - 11 (negative) start to rfind_think_start / find_think_start and asserts no IndexError (returns -1 when no think-start is present). Fails before the fix, passes after.

Verification

New regression test: red before / green after.
Existing test_thinking still passes; _find happy-path (positive start, forward and reverse) unchanged.
End-to-end: with the real Qwen3 chat template, "hi" → 9 tokens → the server's think-search (start = 9 - 11 = -2) now returns -1 instead of raising.
black and isort --profile black clean.

…search mlx_lm.server computes start = len(prompt) - 11 and passes it to rfind_think_start; for prompts under 11 tokens start is negative, so _find indexes past the token list and raises IndexError (surfaced as HTTP 404) on common one-word messages like "hi". Clamp start to 0 in _find; add a regression test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mlx_lm.server 404 on short prompts (clamp negative start in think-token search)#1327

Fix mlx_lm.server 404 on short prompts (clamp negative start in think-token search)#1327
devYRPauli wants to merge 1 commit into
ml-explore:mainfrom
devYRPauli:fix/server-404-short-prompt

devYRPauli commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devYRPauli commented May 29, 2026

Summary

Root cause

Fix

Test

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant