Performance Bug: SentenceTransformer is re-initialized on every Milvus search request

### Bug Description
In `server/app.py`, the `SentenceTransformer(EMBEDDING_MODEL)` is instantiated inside the `milvus_search()` function. This causes the application to load the entire embedding model into memory on every single tool call, resulting in severe latency spikes (multiple seconds per search) and risking OOM crashes under concurrent WebSocket connections.

### Proposed Solution
Move the `SentenceTransformer` initialization to the global scope so it acts as a singleton loaded once at server startup. The `milvus_search` function should reference this global instance. 

I have tested this locally and it drastically reduces tool execution latency. I will open a PR with the fix shortly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Bug: SentenceTransformer is re-initialized on every Milvus search request #128

Bug Description

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Bug: SentenceTransformer is re-initialized on every Milvus search request #128

Description

Bug Description

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions