Skip to content

awkto/rag-mvp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Demo

PLAN

  1. Ingestion

    • Read all Markdown files from a specified directory.
    • Chunk and preprocess the content for embedding.
  2. Embedding

    • Support multiple embedding providers:
      • Local (e.g., HuggingFace models, Ollama, vLLM)
      • Cloud APIs (OpenAI, Gemini)
    • Store embeddings in a vector database.
  3. Vector Database

    • Use a pluggable vector DB (e.g., Chroma, FAISS, Qdrant, Milvus).
  4. Retrieval

    • On user query, embed the query and retrieve relevant chunks from the vector DB.
  5. Generation

    • Use a pluggable LLM backend (Ollama, vLLM, OpenAI API, Gemini API) to generate answers using retrieved context.
  6. Interfaces

    • CLI: Simple command-line chat interface.
    • API: REST or WebSocket backend for chat-style frontends (e.g., React, Streamlit).

WHAT'S BEEN DONE SO FAR

  • Project scaffolded with modular Python code for ingestion, embedding, vector DB, retrieval, and generation.
  • Supports ingestion and chunking of all markdown files in a directory.
  • Embedding with SentenceTransformers (local) and Gemini/OpenAI (cloud) supported.
  • Vector DB integration with Chroma (default, persistent).
  • Retrieval of top-k relevant chunks from the vector DB for each query.
  • Generation of answers using pluggable LLM backends (Gemini, OpenAI, Ollama, with Gemini tested).
  • CLI chat interface with context file display.
  • FastAPI web API for chat, ready for frontend integration (e.g., Streamlit, React, etc.).
  • Environment variable support for Gemini API key.

WHAT STILL NEEDS TO BE IMPLEMENTED

  • Smarter chunking strategies (e.g., by heading, sliding window, or sentence).
  • Additional embedding and LLM backends (Ollama, vLLM, FAISS, Qdrant, Milvus, etc.).
  • Frontend web chat UI (e.g., Streamlit, React, or Vue) for a modern chat experience.
  • Source highlighting: show not just file names but also the specific chunk(s) used for each answer.
  • Config file or CLI/API options for model selection, chunk size, etc.
  • Logging and monitoring of queries, context, and answers.
  • Evaluation scripts and/or tests for retrieval and answer quality.
  • Dockerfile and deployment instructions.
  • Unit/integration tests for ingestion, retrieval, and generation.
  • (Optional) Streaming responses, user feedback, authentication, and advanced features.

STACK

About

rag demo with gui

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages