feat(server): instruction + chat-template wrappers for embedding inputs#130
Merged
Conversation
Instruction-tuned embedding models (e.g. LCO-Embedding-Omni / Qwen2.5-Omni)
need their inputs wrapped in a task instruction + the chat template, then
last-token pooled. Without it the trained embedding position is never hit
and retrieval — especially cross-modal text<->image — collapses to
near-random. The /embedding path tokenized raw content, so these models
served poor embeddings even though images were no longer dropped.
Add two opt-in, per-modality wrapper flags:
--embd-prompt-text "...{content}..." (text inputs)
--embd-prompt-image "...{media}..." (image inputs; request content ignored)
{content} -> request text; {media} -> the server's media marker (one per
image), which solves the random-per-restart marker problem so clients send
raw content unchanged. \n/\t escapes are processed (string_process_escapes)
so values survive INI/preset quoting. Empty (default) = raw input, no change.
Modality is inferred from the request (image_data field / OAI image_url
parts). The recipe is symmetric (query == document) so no role field is
needed.
Validated on LCO-Embedding-Omni-3B-2605 (last-token pooling, normalize):
raw input gave cross-modal margin +0.002 (near-random); with the wrappers,
text same-vs-cross margin +0.30 and cross-modal +0.30 with correct ranking.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Instruction-tuned embedding models (e.g. LCO-Embedding-Omni / Qwen2.5-Omni) require their inputs wrapped in a task instruction + the chat template, then last-token pooled. The
/embeddingpath tokenizes raw content, so the trained embedding position is never hit and retrieval — especially cross-modal text↔image — collapses to near-random. (This is the follow-up to #126, which stopped images being dropped but left retrieval quality poor.)Measured on
LCO-Embedding-Omni-3B-2605(last-token, normalize): raw input → cross-modal margin +0.002 (≈random); the correct recipe → +0.30, correct ranking.Change
Two opt-in, per-modality wrapper flags (empty = off = current raw behavior):
--embd-prompt-text "…{content}…"— text inputs;{content}→ request text.--embd-prompt-image "…{media}…"— image inputs;{media}→ the server's media marker (one per image), request content ignored.{media}substitution solves the random-per-restart marker problem, so clients send raw content unchanged (zero client change).\n/\tescapes are processed (string_process_escapes) so values survive INI/preset quoting. Modality is inferred from the request (image_datafield / OAIimage_urlparts). The recipe is symmetric (query == document) so no role field is needed.Validation
Built + run with the flags via a preset-style literal-
\nCLI (proves escape resolution). Sending raw requests, the server applies the recipe:Deployed via the omniswap preset's per-model block (LCO only; qwen3-embed / embeddinggemma stay raw). Follows #126 / #125 / #127.