fix: cache SentenceTransformer model at startup instead of per-request by adarshkumar23 · Pull Request #190 · kubeflow/docs-agent

adarshkumar23 · 2026-03-30T15:45:02Z

Summary

Cache the SentenceTransformer embedding model at module level instead of re-initializing it on every search request.

Fixes #XX (replace with your issue number)

Problem

milvus_search() in both server/app.py and server-https/app.py calls SentenceTransformer(EMBEDDING_MODEL) on every invocation. Loading all-mpnet-base-v2 takes ~2-5 seconds and ~400MB. In agentic workflows with 3-5 tool calls per turn, this adds 10-25 seconds of unnecessary latency.

Changes

server/app.py — move model init to module level
server-https/app.py — move model init to module level

Performance Impact

	Before	After
Model loads per query (5 tool calls)	5	0
Overhead per query	~15s	~0s
Memory allocation per query	~2GB	0
Startup time	unchanged	+3-5s (one-time)

The embedding model (~400MB) was being re-initialized inside milvus_search() on every call. For agentic RAG workflows with multiple tool calls per user turn, this added seconds of latency per query. Move model initialization to module level so it loads once at server startup and is reused across all requests. Signed-off-by: Adarsh Kumar <adarsh23072005@gmail.com> Signed-off-by: adarshkumar23 <131923092+adarshkumar23@users.noreply.github.com>

google-oss-prow · 2026-03-30T15:45:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign franciscojavierarceo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow bot requested a review from franciscojavierarceo March 30, 2026 15:45

google-oss-prow bot added the size/S label Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cache SentenceTransformer model at startup instead of per-request#190

fix: cache SentenceTransformer model at startup instead of per-request#190
adarshkumar23 wants to merge 1 commit intokubeflow:mainfrom
adarshkumar23:fix/cache-embedding-model

adarshkumar23 commented Mar 30, 2026

Uh oh!

google-oss-prow bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adarshkumar23 commented Mar 30, 2026

Summary

Problem

Changes

Performance Impact

Uh oh!

google-oss-prow bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant