Skip to content

[BUG] Embedding services wrong / incorrect use of queryPrefix #11707

@dmih

Description

@dmih

Problem (one or two sentences)

In code, there are definitions like this:

	"nomic-embed-code": {
		dimension: 3584,
		scoreThreshold: 0.15,
		queryPrefix: "Represent this query for searching relevant code: ",
	},

queryPrefix is then used as an instruction part for embedder, for both indexing and queries. This is incorrect. All modern embedders (including this nomic, but actually >95% of the others) require DIFFERENT instructions for indexing and queries,
or AT LEAST, most of them, require indexing without any queryPrefix, and query with some prefix.

Context (who is affected and when)

Embedding performance with self-hosted embedders is sub par due to this.

Most public paid embedding APIs, however, are not affected, because many of them are, indeed, instrtuction-less.

Reproduction steps

Enable Indexing. Observe verbose logs on prompts for embedder on llama-cpp side (as an example). Find improper use of the queryPrefix:
If none defined, none used (incorrect),
If some defined, used for BOTH indexing and search (also incorrect for most of them).

Expected result

We should have different templates for indexing and for querying.

Actual result

We now have single templates for indexing and for querying, which is incorrect for most modern self-hosted embedders.

Variations tried (optional)

No response

App Version

3.50.3 (79d11ff)

API Provider (optional)

OpenAI Compatible

Model Used (optional)

No response

Roo Code Task Links (optional)

No response

Relevant logs or errors (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions