Skip to content

SentenceTransformer model reloaded on every search request #189

@adarshkumar23

Description

@adarshkumar23

Problem

The milvus_search() function in both server/app.py (line 77) and server-https/app.py (line 123) initializes a new SentenceTransformer instance on every call:

encoder = SentenceTransformer(EMBEDDING_MODEL)

Loading sentence-transformers/all-mpnet-base-v2 takes ~2-5 seconds and ~400MB of memory each time. In agentic RAG workflows with multiple tool calls per user turn, this adds 10-25 seconds of unnecessary latency per query.

Proposed Fix

Initialize the model once at module level and reuse it across all requests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions