SentenceTransformer model reloaded on every search request

## Problem

The `milvus_search()` function in both `server/app.py` (line 77) and `server-https/app.py` (line 123) initializes a new `SentenceTransformer` instance on every call:
```python
encoder = SentenceTransformer(EMBEDDING_MODEL)
```

Loading `sentence-transformers/all-mpnet-base-v2` takes ~2-5 seconds and ~400MB of memory each time. In agentic RAG workflows with multiple tool calls per user turn, this adds 10-25 seconds of unnecessary latency per query.

## Proposed Fix

Initialize the model once at module level and reuse it across all requests.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SentenceTransformer model reloaded on every search request #189

Problem

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SentenceTransformer model reloaded on every search request #189

Description

Problem

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions