RAG Demo

PLAN

Ingestion
- Read all Markdown files from a specified directory.
- Chunk and preprocess the content for embedding.
Embedding
- Support multiple embedding providers:
  - Local (e.g., HuggingFace models, Ollama, vLLM)
  - Cloud APIs (OpenAI, Gemini)
- Store embeddings in a vector database.
Vector Database
- Use a pluggable vector DB (e.g., Chroma, FAISS, Qdrant, Milvus).
Retrieval
- On user query, embed the query and retrieve relevant chunks from the vector DB.
Generation
- Use a pluggable LLM backend (Ollama, vLLM, OpenAI API, Gemini API) to generate answers using retrieved context.
Interfaces
- CLI: Simple command-line chat interface.
- API: REST or WebSocket backend for chat-style frontends (e.g., React, Streamlit).

Project scaffolded with modular Python code for ingestion, embedding, vector DB, retrieval, and generation.
Supports ingestion and chunking of all markdown files in a directory.
Embedding with SentenceTransformers (local) and Gemini/OpenAI (cloud) supported.
Vector DB integration with Chroma (default, persistent).
Retrieval of top-k relevant chunks from the vector DB for each query.
Generation of answers using pluggable LLM backends (Gemini, OpenAI, Ollama, with Gemini tested).
CLI chat interface with context file display.
FastAPI web API for chat, ready for frontend integration (e.g., Streamlit, React, etc.).
Environment variable support for Gemini API key.

Smarter chunking strategies (e.g., by heading, sliding window, or sentence).
Additional embedding and LLM backends (Ollama, vLLM, FAISS, Qdrant, Milvus, etc.).
Frontend web chat UI (e.g., Streamlit, React, or Vue) for a modern chat experience.
Source highlighting: show not just file names but also the specific chunk(s) used for each answer.
Config file or CLI/API options for model selection, chunk size, etc.
Logging and monitoring of queries, context, and answers.
Evaluation scripts and/or tests for retrieval and answer quality.
Dockerfile and deployment instructions.
Unit/integration tests for ingestion, retrieval, and generation.
(Optional) Streaming responses, user feedback, authentication, and advanced features.

Language: Python 3.10+
Vector DB: Chroma (default), with optional support for FAISS, Qdrant, or Milvus
Embeddings:
- SentenceTransformers (local)
- Ollama (local)
- vLLM (local)
- OpenAI API
- Gemini API
LLM Backends:
- Ollama, vLLM, OpenAI API, Gemini API (pluggable)
Frontend:
- CLI (built-in)
- REST API (FastAPI) for chat UIs (e.g., Streamlit, React)
Other:
- Markdown for parsing
- Typer for CLI
- FastAPI for API

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
rag		rag
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt