Skip to content

sachinML/self-correcting-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Correcting RAG with AgentScope

A Retrieval-Augmented Generation system that automatically retries and improves its own answers when evaluation scores are low. Built entirely on AgentScope to demonstrate how the framework simplifies multi-agent pipelines.

Detail blog is in this link.


What it does

Self-Correcting RAG Flow


AgentScope features used

Feature Where What it replaces
AgentBase All agents Plain Python classes gives async reply(), hook system, observe() for free
Msg + metadata Everywhere Raw dicts / function args typed message bus between agents
sequential_pipeline orchestrator.py Manual a1 → a2 → a3 wiring one call chains all three agents
MsgHub orchestrator.py Manual observe() calls auto-broadcasts every reply to all participants
register_instance_hook / post_reply logging_hooks.py Logging scattered inside agents attach once at setup, agents stay clean
GeminiChatModel All LLM agents Raw google.genai API calls one object handles async, formatting, response parsing
GeminiChatFormatter All LLM agents Hand-crafting {"role": "user", "parts": [...]} dicts converts Msg objects automatically
GeminiTextEmbedding knowledge_builder.py Custom embedding client one object handles API calls + batching
FileEmbeddingCache knowledge_builder.py Custom disk cache persists embeddings so Gemini API is called only once
VDBStoreBase faiss_store.py Coupled FAISS code pluggable interface, swap for Qdrant/Milvus with zero code change
SimpleKnowledge knowledge_builder.py Manual embed-query + FAISS search + decode — one await kb.retrieve(query, limit=5) call

Project structure

self-correct-rag/
├── app.py                         # Streamlit UI: main entry point
├── setup.py                       # First-time setup: install deps + build index
├── flowchart.svg                  # Architecture diagram (used in this README)
├── requirements.txt
├── .env.example
│
├── src/
│   ├── config.py                  # Env loading, API keys, shared constants
│   ├── system.py                  # Builds the AgentScope orchestrator
│   │
│   ├── agents/
│   │   ├── retrieval_agent.py     # AgentBase: SimpleKnowledge.retrieve()
│   │   ├── generation_agent.py    # AgentBase: GeminiChatModel (Flash)
│   │   ├── evaluator_agent.py     # AgentBase: GeminiChatModel (Pro)
│   │   ├── query_rewriter_agent.py# AgentBase: GeminiChatModel (Flash)
│   │   └── orchestrator.py        # sequential_pipeline + MsgHub + retry loop
│   │
│   ├── rag/
│   │   ├── faiss_store.py         # VDBStoreBase implementation (FAISS)
│   │   └── knowledge_builder.py   # GeminiTextEmbedding + FileEmbeddingCache + SimpleKnowledge
│   │
│   ├── hooks/
│   │   └── logging_hooks.py       # post_reply hooks: transparent observability
│   │
│   ├── data/
│   │   └── fetch_wikipedia.py     # Fetches + chunks 30 Wikipedia articles
│   │
│   └── utils/
│       ├── prompts.py             # Prompt templates (generation, evaluation, rewrite)
│       └── logger.py              # Structured JSON logger per query run
│
├── data/                          # Created by setup.py
│   ├── chunks.json                # 286 Wikipedia chunks
│   ├── faiss.index                # FAISS binary index
│   ├── faiss_meta.json            # Document metadata
│   └── embed_cache/               # Cached Gemini embeddings (skip API on re-run)
│
└── logs/                          # Per-query JSON logs with all attempt details

Setup

1. Clone and create a virtual environment

git clone <repo-url>
cd self-correct-rag
python -m venv agent-scope
source agent-scope/bin/activate   # Windows: agent-scope\Scripts\activate

2. Add your API key

cp .env.example .env
# open .env and set: GEMINI_API_KEY=your_key_here

3. Run setup (installs deps + builds knowledge base)

python setup.py

This installs all dependencies, fetches 30 Wikipedia articles, chunks them into ~286 passages, embeds them with gemini-embedding-001 via AgentScope's GeminiTextEmbedding, and builds a FAISS index. Embeddings are cached to disk subsequent runs skip all API calls.


Usage

streamlit run app.py

How the retry loop works

Each attempt runs through the full sequential_pipeline:

RetrievalAgent → GenerationAgent → EvaluatorAgent

The EvaluatorAgent (Gemini 2.5 Pro) scores the answer on:

  • Faithfulness is the answer grounded in the retrieved context?
  • Relevance does it actually answer the question?

If score < 0.75, the orchestrator picks a retry strategy based on what failed:

failure_type Strategy
retrieval_weak QueryRewriterAgent rewrites the query to be more specific
hallucination Re-run with a strict grounding prompt that bars any outside knowledge
answer_vague Expand top_k to retrieve more context

top_k escalates every attempt regardless: 2 → 5 → 8 → 10. The best-scoring attempt across all tries is returned.


Models

Model Used for
gemini-2.5-flash Answer generation, query rewriting
gemini-2.5-pro Evaluation (stronger judge for faithfulness)
gemini-embedding-001 Document and query embeddings (dim=3072)

Knowledge base

30 Wikipedia topics covering ML/NLP/AI:

Artificial intelligence, Machine learning, Deep learning, Natural language processing, Transformer, BERT, GPT, Retrieval-augmented generation, Information retrieval, Knowledge graph, Neural network, RNN, CNN, Attention mechanism, Word embedding, Semantic search, Question answering, Text summarization, NER, Sentiment analysis, Transfer learning, Fine-tuning, Reinforcement learning, GAN, Autoencoder, VAE, Prompt engineering, Zero-shot learning, Few-shot learning, Federated learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages