Skip to content

feat(server): instruction + chat-template wrappers for embedding inputs#130

Merged
marksverdhei merged 1 commit into
htfrom
feat/embd-instruction-prompt
Jun 16, 2026
Merged

feat(server): instruction + chat-template wrappers for embedding inputs#130
marksverdhei merged 1 commit into
htfrom
feat/embd-instruction-prompt

Conversation

@marksverdhei

Copy link
Copy Markdown

Problem

Instruction-tuned embedding models (e.g. LCO-Embedding-Omni / Qwen2.5-Omni) require their inputs wrapped in a task instruction + the chat template, then last-token pooled. The /embedding path tokenizes raw content, so the trained embedding position is never hit and retrieval — especially cross-modal text↔image — collapses to near-random. (This is the follow-up to #126, which stopped images being dropped but left retrieval quality poor.)

Measured on LCO-Embedding-Omni-3B-2605 (last-token, normalize): raw input → cross-modal margin +0.002 (≈random); the correct recipe → +0.30, correct ranking.

Change

Two opt-in, per-modality wrapper flags (empty = off = current raw behavior):

  • --embd-prompt-text "…{content}…" — text inputs; {content} → request text.
  • --embd-prompt-image "…{media}…" — image inputs; {media} → the server's media marker (one per image), request content ignored.

{media} substitution solves the random-per-restart marker problem, so clients send raw content unchanged (zero client change). \n/\t escapes are processed (string_process_escapes) so values survive INI/preset quoting. Modality is inferred from the request (image_data field / OAI image_url parts). The recipe is symmetric (query == document) so no role field is needed.

Validation

Built + run with the flags via a preset-style literal-\n CLI (proves escape resolution). Sending raw requests, the server applies the recipe:

  • text same−cross margin: +0.21 (raw) → +0.30
  • cross-modal q="red circle": red 0.65 vs blue 0.35, +0.30 RIGHT (raw: +0.002)
  • output L2-normalized ✓

Deployed via the omniswap preset's per-model block (LCO only; qwen3-embed / embeddinggemma stay raw). Follows #126 / #125 / #127.

Instruction-tuned embedding models (e.g. LCO-Embedding-Omni / Qwen2.5-Omni)
need their inputs wrapped in a task instruction + the chat template, then
last-token pooled. Without it the trained embedding position is never hit
and retrieval — especially cross-modal text<->image — collapses to
near-random. The /embedding path tokenized raw content, so these models
served poor embeddings even though images were no longer dropped.

Add two opt-in, per-modality wrapper flags:
  --embd-prompt-text  "...{content}..."  (text inputs)
  --embd-prompt-image "...{media}..."    (image inputs; request content ignored)
{content} -> request text; {media} -> the server's media marker (one per
image), which solves the random-per-restart marker problem so clients send
raw content unchanged. \n/\t escapes are processed (string_process_escapes)
so values survive INI/preset quoting. Empty (default) = raw input, no change.

Modality is inferred from the request (image_data field / OAI image_url
parts). The recipe is symmetric (query == document) so no role field is
needed.

Validated on LCO-Embedding-Omni-3B-2605 (last-token pooling, normalize):
raw input gave cross-modal margin +0.002 (near-random); with the wrappers,
text same-vs-cross margin +0.30 and cross-modal +0.30 with correct ranking.
@marksverdhei marksverdhei merged commit 64890ea into ht Jun 16, 2026
20 of 23 checks passed
@marksverdhei marksverdhei deleted the feat/embd-instruction-prompt branch June 16, 2026 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant