feat: add an optional local reranker on top of OpenAI embeddings

## Problem statement

AILSS currently gets most of its retrieval power from OpenAI embeddings, but the Python-first
agent backend still lacks a bounded local reasoning stage between retrieval and answer
construction.

Today, the `apps/api` workflow is:

- `retrieve -> decide -> read -> answer -> validate`
- retrieval can already return grounded candidates from the local SQLite + `sqlite-vec` index
- `decide_context` still relies on simple heuristics
- `validate_answer` only checks that citations exist, not that the selected evidence was the
  best available match

This leaves a gap between "good enough vector retrieval" and "high-confidence grounded answer
selection". We want a local, low-cost improvement that strengthens orchestration and quality
without replacing the existing OpenAI embedding path.

## Proposed solution

Add an optional **local reranker stage** on top of the existing OpenAI embedding retrieval path.

Scope the first implementation to the Python backend and its Obsidian-managed runtime:

- Keep OpenAI embeddings as the source of truth for indexing and query embeddings
- Add a small local reranker model that scores the top semantic retrieval candidates
- Reorder candidates before `decide_context` selects final notes for reading / answer building
- Record reranker usage and latency in eval and run artifacts so the effect is measurable

Design direction:

- Python-first boundary:
  - implement reranking in `apps/api` first
  - do not require index DB schema changes for v1
  - keep the MCP / Node retrieval path unchanged unless the Python-first experiment proves out
- Bounded local-model role:
  - use the local model for pairwise or query-document relevance scoring only
  - do not introduce a local generative answer model in this issue
  - fail fast if reranking is explicitly enabled but the local model cannot load

Proposed file-level plan:

- `apps/api/src/ailss_api/config.py`
  - add settings for reranker enablement and runtime contract
  - example env vars:
    - `AILSS_API_RERANKER_ENABLED`
    - `AILSS_API_RERANKER_MODEL`
    - `AILSS_API_RERANK_CANDIDATES`
    - `AILSS_API_RERANK_KEEP_TOP_K`
- `apps/api/src/ailss_api/models.py`
  - extend retrieval / agent metrics with reranker metadata
  - example fields:
    - `reranker_model`
    - `reranker_latency_ms`
    - `reranker_candidates`
- new module: `apps/api/src/ailss_api/reranking.py`
  - define a small interface for reranking
  - suggested contract:
    - `class RerankCandidate(BaseModel): path, title, snippet, evidence_text`
    - `class RerankResult(BaseModel): path, score, rank`
    - `def rerank_candidates(query: str, candidates: list[RerankCandidate], settings: Settings) -> list[RerankResult]`
  - first implementation can wrap a local cross-encoder / reranker model
- `apps/api/src/ailss_api/retrieval.py` or `apps/api/src/ailss_api/agent.py`
  - integrate reranking immediately after retrieval and before note selection
  - recommended first insertion point:
    - keep `retrieve_notes()` as-is for baseline retrieval
    - rerank inside `_retrieve_context()` or right after retrieval results are returned
    - preserve raw retrieval results for observability when reranking is enabled
- `apps/api/src/ailss_api/evals.py`
  - include reranker on/off coverage in eval runs
  - compare retrieval and agent pass-rate changes with the same dataset
- `packages/obsidian-plugin/src/pythonApi/pythonApiServiceController.ts`
  - pass reranker env vars into the Python backend process once the backend contract is stable
- docs:
  - update `docs/architecture/python-first-local-agent-backend.md`
  - update `docs/03-plan.md`
  - add local-dev notes if extra runtime dependencies are needed

Suggested runtime behavior:

- semantic retrieval returns top `N` candidates
- local reranker rescoring runs only on those `N` candidates
- final agent selection uses reranked order
- responses and artifacts expose whether reranking was used
- eval output makes it easy to compare:
  - baseline retrieval only
  - retrieval + local reranker

Acceptance direction:

- no regression when reranker is disabled
- explicit failure when reranker is enabled but unavailable
- measurable eval comparison on the existing local dataset runner
- no hidden fallback from local reranker back to a different model path

## Constraints / context (optional)

- Must stay aligned with issue #175 and the Python-first local agent backend direction
- Must remain local-first and single-user for this phase
- Must keep OpenAI embeddings in place for indexing and query embeddings
- Must not expand write authority or change the current explicit-write safety model
- Should prefer a small local reranker over a local generative model for the first step
- Should preserve reproducible eval and observability so the quality tradeoff is visible
- If multilingual vault support matters, prefer a multilingual reranker candidate during model selection


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add an optional local reranker on top of OpenAI embeddings #184

Problem statement

Proposed solution

Constraints / context (optional)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: add an optional local reranker on top of OpenAI embeddings #184

Description

Problem statement

Proposed solution

Constraints / context (optional)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions