Skip to content

fix: cache SentenceTransformer model at startup instead of per-request#190

Open
adarshkumar23 wants to merge 1 commit intokubeflow:mainfrom
adarshkumar23:fix/cache-embedding-model
Open

fix: cache SentenceTransformer model at startup instead of per-request#190
adarshkumar23 wants to merge 1 commit intokubeflow:mainfrom
adarshkumar23:fix/cache-embedding-model

Conversation

@adarshkumar23
Copy link
Copy Markdown

Summary

Cache the SentenceTransformer embedding model at module level instead of re-initializing it on every search request.

Fixes #XX (replace with your issue number)

Problem

milvus_search() in both server/app.py and server-https/app.py calls SentenceTransformer(EMBEDDING_MODEL) on every invocation. Loading all-mpnet-base-v2 takes ~2-5 seconds and ~400MB. In agentic workflows with 3-5 tool calls per turn, this adds 10-25 seconds of unnecessary latency.

Changes

  • server/app.py — move model init to module level
  • server-https/app.py — move model init to module level

Performance Impact

Before After
Model loads per query (5 tool calls) 5 0
Overhead per query ~15s ~0s
Memory allocation per query ~2GB 0
Startup time unchanged +3-5s (one-time)

The embedding model (~400MB) was being re-initialized inside
milvus_search() on every call. For agentic RAG workflows with
multiple tool calls per user turn, this added seconds of latency
per query.

Move model initialization to module level so it loads once at
server startup and is reused across all requests.

Signed-off-by: Adarsh Kumar <adarsh23072005@gmail.com>
Signed-off-by: adarshkumar23 <131923092+adarshkumar23@users.noreply.github.com>
@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign franciscojavierarceo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant