Skip to content

perf: Move SentenceTransformer to module level to prevent reload on every query#180

Open
24f3005089 wants to merge 2 commits intokubeflow:mainfrom
24f3005089:fix/sentence-transformer-caching
Open

perf: Move SentenceTransformer to module level to prevent reload on every query#180
24f3005089 wants to merge 2 commits intokubeflow:mainfrom
24f3005089:fix/sentence-transformer-caching

Conversation

@24f3005089
Copy link
Copy Markdown

Fixes #178

Summary

This PR fixes a performance issue where the SentenceTransformer model was being reloaded from disk on every search request, causing 500ms-2s latency per call.

Changes

  • Moved encoder = SentenceTransformer(EMBEDDING_MODEL) initialization from inside milvus_search() function to module level
  • Applied fix to both server/app.py and server-https/app.py
  • Model is now loaded once at startup and reused for all queries

Impact

  • Eliminates 500ms-2s model loading overhead on every query
  • Particularly beneficial in agentic RAG workflows where search_kubeflow_docs may be invoked multiple times per conversation turn
  • Improves overall system responsiveness and reduces resource usage

Testing

Code change only - verified syntax and logic. The model initialization now happens at module import time, and the encoder variable is reused across all function calls.

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign franciscojavierarceo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: SentenceTransformer model reloaded on every search request

1 participant