-
Ingestion
- Read all Markdown files from a specified directory.
- Chunk and preprocess the content for embedding.
-
Embedding
- Support multiple embedding providers:
- Local (e.g., HuggingFace models, Ollama, vLLM)
- Cloud APIs (OpenAI, Gemini)
- Store embeddings in a vector database.
- Support multiple embedding providers:
-
Vector Database
- Use a pluggable vector DB (e.g., Chroma, FAISS, Qdrant, Milvus).
-
Retrieval
- On user query, embed the query and retrieve relevant chunks from the vector DB.
-
Generation
- Use a pluggable LLM backend (Ollama, vLLM, OpenAI API, Gemini API) to generate answers using retrieved context.
-
Interfaces
- CLI: Simple command-line chat interface.
- API: REST or WebSocket backend for chat-style frontends (e.g., React, Streamlit).
- Project scaffolded with modular Python code for ingestion, embedding, vector DB, retrieval, and generation.
- Supports ingestion and chunking of all markdown files in a directory.
- Embedding with SentenceTransformers (local) and Gemini/OpenAI (cloud) supported.
- Vector DB integration with Chroma (default, persistent).
- Retrieval of top-k relevant chunks from the vector DB for each query.
- Generation of answers using pluggable LLM backends (Gemini, OpenAI, Ollama, with Gemini tested).
- CLI chat interface with context file display.
- FastAPI web API for chat, ready for frontend integration (e.g., Streamlit, React, etc.).
- Environment variable support for Gemini API key.
- Smarter chunking strategies (e.g., by heading, sliding window, or sentence).
- Additional embedding and LLM backends (Ollama, vLLM, FAISS, Qdrant, Milvus, etc.).
- Frontend web chat UI (e.g., Streamlit, React, or Vue) for a modern chat experience.
- Source highlighting: show not just file names but also the specific chunk(s) used for each answer.
- Config file or CLI/API options for model selection, chunk size, etc.
- Logging and monitoring of queries, context, and answers.
- Evaluation scripts and/or tests for retrieval and answer quality.
- Dockerfile and deployment instructions.
- Unit/integration tests for ingestion, retrieval, and generation.
- (Optional) Streaming responses, user feedback, authentication, and advanced features.
- Language: Python 3.10+
- Vector DB: Chroma (default), with optional support for FAISS, Qdrant, or Milvus
- Embeddings:
- SentenceTransformers (local)
- Ollama (local)
- vLLM (local)
- OpenAI API
- Gemini API
- LLM Backends:
- Ollama, vLLM, OpenAI API, Gemini API (pluggable)
- Frontend:
- CLI (built-in)
- REST API (FastAPI) for chat UIs (e.g., Streamlit, React)
- Other: