Add intfloat/multilingual-e5-large-instruct model support#622
Add intfloat/multilingual-e5-large-instruct model support#622RimoGuin wants to merge 1 commit intoqdrant:mainfrom
Conversation
📝 WalkthroughWalkthroughThis pull request adds support for a new ONNX-based text embedding model, Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
All Submissions:
New models submission:
Closes #140. This was previously attempted in #181 but was not completed.
intfloat/multilingual-e5-large-instructis a state-of-the-art multilingual embedding model that supports instruction-based embeddings across 100+ languages. It outperformsmultilingual-e5-largeon MTEB benchmarks and is widely used for multilingual retrieval tasks. Personally, multilingual-e5-large-instruct is very much better in retrieval tasks(even with other supported languages) than multilingual-e5-large.Yes, canonical values were computed using fastembed itself (not sentence-transformers).
Yes, verified via a standalone script that the canonical vector matches within
atol=1e-3.cc @hh-space-invader @joein