Skip to content

feat: add an optional local reranker on top of OpenAI embeddings #184

@maehwasoo

Description

@maehwasoo

Problem statement

AILSS currently gets most of its retrieval power from OpenAI embeddings, but the Python-first
agent backend still lacks a bounded local reasoning stage between retrieval and answer
construction.

Today, the apps/api workflow is:

  • retrieve -> decide -> read -> answer -> validate
  • retrieval can already return grounded candidates from the local SQLite + sqlite-vec index
  • decide_context still relies on simple heuristics
  • validate_answer only checks that citations exist, not that the selected evidence was the
    best available match

This leaves a gap between "good enough vector retrieval" and "high-confidence grounded answer
selection". We want a local, low-cost improvement that strengthens orchestration and quality
without replacing the existing OpenAI embedding path.

Proposed solution

Add an optional local reranker stage on top of the existing OpenAI embedding retrieval path.

Scope the first implementation to the Python backend and its Obsidian-managed runtime:

  • Keep OpenAI embeddings as the source of truth for indexing and query embeddings
  • Add a small local reranker model that scores the top semantic retrieval candidates
  • Reorder candidates before decide_context selects final notes for reading / answer building
  • Record reranker usage and latency in eval and run artifacts so the effect is measurable

Design direction:

  • Python-first boundary:
    • implement reranking in apps/api first
    • do not require index DB schema changes for v1
    • keep the MCP / Node retrieval path unchanged unless the Python-first experiment proves out
  • Bounded local-model role:
    • use the local model for pairwise or query-document relevance scoring only
    • do not introduce a local generative answer model in this issue
    • fail fast if reranking is explicitly enabled but the local model cannot load

Proposed file-level plan:

  • apps/api/src/ailss_api/config.py
    • add settings for reranker enablement and runtime contract
    • example env vars:
      • AILSS_API_RERANKER_ENABLED
      • AILSS_API_RERANKER_MODEL
      • AILSS_API_RERANK_CANDIDATES
      • AILSS_API_RERANK_KEEP_TOP_K
  • apps/api/src/ailss_api/models.py
    • extend retrieval / agent metrics with reranker metadata
    • example fields:
      • reranker_model
      • reranker_latency_ms
      • reranker_candidates
  • new module: apps/api/src/ailss_api/reranking.py
    • define a small interface for reranking
    • suggested contract:
      • class RerankCandidate(BaseModel): path, title, snippet, evidence_text
      • class RerankResult(BaseModel): path, score, rank
      • def rerank_candidates(query: str, candidates: list[RerankCandidate], settings: Settings) -> list[RerankResult]
    • first implementation can wrap a local cross-encoder / reranker model
  • apps/api/src/ailss_api/retrieval.py or apps/api/src/ailss_api/agent.py
    • integrate reranking immediately after retrieval and before note selection
    • recommended first insertion point:
      • keep retrieve_notes() as-is for baseline retrieval
      • rerank inside _retrieve_context() or right after retrieval results are returned
      • preserve raw retrieval results for observability when reranking is enabled
  • apps/api/src/ailss_api/evals.py
    • include reranker on/off coverage in eval runs
    • compare retrieval and agent pass-rate changes with the same dataset
  • packages/obsidian-plugin/src/pythonApi/pythonApiServiceController.ts
    • pass reranker env vars into the Python backend process once the backend contract is stable
  • docs:
    • update docs/architecture/python-first-local-agent-backend.md
    • update docs/03-plan.md
    • add local-dev notes if extra runtime dependencies are needed

Suggested runtime behavior:

  • semantic retrieval returns top N candidates
  • local reranker rescoring runs only on those N candidates
  • final agent selection uses reranked order
  • responses and artifacts expose whether reranking was used
  • eval output makes it easy to compare:
    • baseline retrieval only
    • retrieval + local reranker

Acceptance direction:

  • no regression when reranker is disabled
  • explicit failure when reranker is enabled but unavailable
  • measurable eval comparison on the existing local dataset runner
  • no hidden fallback from local reranker back to a different model path

Constraints / context (optional)

  • Must stay aligned with issue feat: establish a Python-first local agent backend baseline #175 and the Python-first local agent backend direction
  • Must remain local-first and single-user for this phase
  • Must keep OpenAI embeddings in place for indexing and query embeddings
  • Must not expand write authority or change the current explicit-write safety model
  • Should prefer a small local reranker over a local generative model for the first step
  • Should preserve reproducible eval and observability so the quality tradeoff is visible
  • If multilingual vault support matters, prefer a multilingual reranker candidate during model selection

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestmonorepoCross-cutting changes

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions