Fix performance issue: Avoid re-initializing SentenceTransformer and remove duplicate milvus_search definition#186
Fix performance issue: Avoid re-initializing SentenceTransformer and remove duplicate milvus_search definition#186Ayush-kathil wants to merge 3 commits intokubeflow:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi, I’ve submitted a PR fixing this issue by moving SentenceTransformer initialization to the global scope and removing duplicate function definitions. This significantly improves performance and code clarity. Would appreciate feedback! |
f1c478d to
63baf14
Compare
Instantiate the SentenceTransformer at module level in server-https to avoid recreating the encoder for each milvus_search call, and update milvus_search to use embedding_model.encode(...). Remove the duplicated milvus_search implementation from server/app.py to centralize the search logic and reduce redundancy and overhead from repeated model loads. Signed-off-by: Ayush-kathil <kathilshiva@gmail.com>
Signed-off-by: Ayush-kathil <kathilshiva@gmail.com>
|
This PR addresses a clear performance anti-pattern in the RAG pipeline. Previously, The refactor ensures that the embedding model is initialized once and reused across requests, aligning with standard practices for ML model lifecycle management in backend services. What’s good:
Suggestions / Minor improvements:
Overall, this is a meaningful performance improvement with no functional regression. Good contribution. |
…s_search Implemented thread-safe lazy-loading for SentenceTransformer to eliminate redundant loading within milvus_search. Signed-off-by: Ayush-kathil <kathilshiva@gmail.com>
DescriptionThis PR resolves a critical performance and memory bottleneck in the RAG pipeline caused by redundant instantiation of Core Changes
Performance Impact
Validation
Please let me know if any refinements or additional checks are required before merge. |
Fixes #128
Problem:
SentenceTransformer(EMBEDDING_MODEL) was instantiated inside the milvus_search() function, causing repeated model loading on every request, leading to latency spikes and increased memory usage. Additionally, duplicate definitions of milvus_search existed, causing ambiguity.
Solution:
Impact:
Tested locally and observed faster response times for repeated queries.