Skip to content

Add intfloat/multilingual-e5-large-instruct model support#622

Open
RimoGuin wants to merge 1 commit intoqdrant:mainfrom
RimoGuin:feat/add-multilingual-e5-large-instruct
Open

Add intfloat/multilingual-e5-large-instruct model support#622
RimoGuin wants to merge 1 commit intoqdrant:mainfrom
RimoGuin:feat/add-multilingual-e5-large-instruct

Conversation

@RimoGuin
Copy link
Copy Markdown

@RimoGuin RimoGuin commented Apr 6, 2026

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New models submission:

  • Have you added an explanation of why it's important to include this model?

Closes #140. This was previously attempted in #181 but was not completed. intfloat/multilingual-e5-large-instruct is a state-of-the-art multilingual embedding model that supports instruction-based embeddings across 100+ languages. It outperforms multilingual-e5-large on MTEB benchmarks and is widely used for multilingual retrieval tasks. Personally, multilingual-e5-large-instruct is very much better in retrieval tasks(even with other supported languages) than multilingual-e5-large.

  • Have you added tests for the new model? Were canonical values for tests computed via the original model?

Yes, canonical values were computed using fastembed itself (not sentence-transformers).

  • Have you added the code snippet for how canonical values were computed?
from fastembed import TextEmbedding
import numpy as np

model = TextEmbedding(model_name="intfloat/multilingual-e5-large-instruct")
docs = ["hello world", "flag embedding"]
embeddings = list(model.embed(docs))
vec = np.array(embeddings[0])
print("First 5 values:", vec[:5].tolist())
  • Have you successfully ran tests with your changes locally?

Yes, verified via a standalone script that the canonical vector matches within atol=1e-3.
cc @hh-space-invader @joein

Note: #181 previously attempted to add this model but required a manual ONNX export as official ONNX support was unavailable at the time(I believe). The ONNX model is now officially available on the model's HuggingFace page, making this a clean addition without any manual export.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

This pull request adds support for a new ONNX-based text embedding model, intfloat/multilingual-e5-large-instruct, by extending the supported models registry in the FastEmbed library. The model entry specifies an embedding dimension of 1024 tokens, references the Hugging Face model source, and declares the ONNX artifact locations (model file and data file). A corresponding test entry was added to the canonical vector values dictionary to support validation testing of the new model.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: adding support for the intfloat/multilingual-e5-large-instruct model.
Linked Issues check ✅ Passed The PR successfully implements the primary objective from issue #140 by adding the intfloat/multilingual-e5-large-instruct model to the supported models with proper configuration and tests.
Out of Scope Changes check ✅ Passed All changes are directly related to adding the new model: model entry in onnx_embedding.py and corresponding test canonical vector in test file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description directly addresses the changeset by explaining the addition of the intfloat/multilingual-e5-large-instruct model, providing justification, test details, and verification steps.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support intfloat/multilingual-e5-large-instruct

1 participant