A Retrieval-Augmented Generation system that automatically retries and improves its own answers when evaluation scores are low. Built entirely on AgentScope to demonstrate how the framework simplifies multi-agent pipelines.
Detail blog is in this link.
| Feature | Where | What it replaces |
|---|---|---|
AgentBase |
All agents | Plain Python classes gives async reply(), hook system, observe() for free |
Msg + metadata |
Everywhere | Raw dicts / function args typed message bus between agents |
sequential_pipeline |
orchestrator.py |
Manual a1 → a2 → a3 wiring one call chains all three agents |
MsgHub |
orchestrator.py |
Manual observe() calls auto-broadcasts every reply to all participants |
register_instance_hook / post_reply |
logging_hooks.py |
Logging scattered inside agents attach once at setup, agents stay clean |
GeminiChatModel |
All LLM agents | Raw google.genai API calls one object handles async, formatting, response parsing |
GeminiChatFormatter |
All LLM agents | Hand-crafting {"role": "user", "parts": [...]} dicts converts Msg objects automatically |
GeminiTextEmbedding |
knowledge_builder.py |
Custom embedding client one object handles API calls + batching |
FileEmbeddingCache |
knowledge_builder.py |
Custom disk cache persists embeddings so Gemini API is called only once |
VDBStoreBase |
faiss_store.py |
Coupled FAISS code pluggable interface, swap for Qdrant/Milvus with zero code change |
SimpleKnowledge |
knowledge_builder.py |
Manual embed-query + FAISS search + decode — one await kb.retrieve(query, limit=5) call |
self-correct-rag/
├── app.py # Streamlit UI: main entry point
├── setup.py # First-time setup: install deps + build index
├── flowchart.svg # Architecture diagram (used in this README)
├── requirements.txt
├── .env.example
│
├── src/
│ ├── config.py # Env loading, API keys, shared constants
│ ├── system.py # Builds the AgentScope orchestrator
│ │
│ ├── agents/
│ │ ├── retrieval_agent.py # AgentBase: SimpleKnowledge.retrieve()
│ │ ├── generation_agent.py # AgentBase: GeminiChatModel (Flash)
│ │ ├── evaluator_agent.py # AgentBase: GeminiChatModel (Pro)
│ │ ├── query_rewriter_agent.py# AgentBase: GeminiChatModel (Flash)
│ │ └── orchestrator.py # sequential_pipeline + MsgHub + retry loop
│ │
│ ├── rag/
│ │ ├── faiss_store.py # VDBStoreBase implementation (FAISS)
│ │ └── knowledge_builder.py # GeminiTextEmbedding + FileEmbeddingCache + SimpleKnowledge
│ │
│ ├── hooks/
│ │ └── logging_hooks.py # post_reply hooks: transparent observability
│ │
│ ├── data/
│ │ └── fetch_wikipedia.py # Fetches + chunks 30 Wikipedia articles
│ │
│ └── utils/
│ ├── prompts.py # Prompt templates (generation, evaluation, rewrite)
│ └── logger.py # Structured JSON logger per query run
│
├── data/ # Created by setup.py
│ ├── chunks.json # 286 Wikipedia chunks
│ ├── faiss.index # FAISS binary index
│ ├── faiss_meta.json # Document metadata
│ └── embed_cache/ # Cached Gemini embeddings (skip API on re-run)
│
└── logs/ # Per-query JSON logs with all attempt details
git clone <repo-url>
cd self-correct-rag
python -m venv agent-scope
source agent-scope/bin/activate # Windows: agent-scope\Scripts\activatecp .env.example .env
# open .env and set: GEMINI_API_KEY=your_key_herepython setup.pyThis installs all dependencies, fetches 30 Wikipedia articles, chunks them into ~286 passages, embeds them with gemini-embedding-001 via AgentScope's GeminiTextEmbedding, and builds a FAISS index. Embeddings are cached to disk subsequent runs skip all API calls.
streamlit run app.pyEach attempt runs through the full sequential_pipeline:
RetrievalAgent → GenerationAgent → EvaluatorAgent
The EvaluatorAgent (Gemini 2.5 Pro) scores the answer on:
- Faithfulness is the answer grounded in the retrieved context?
- Relevance does it actually answer the question?
If score < 0.75, the orchestrator picks a retry strategy based on what failed:
failure_type |
Strategy |
|---|---|
retrieval_weak |
QueryRewriterAgent rewrites the query to be more specific |
hallucination |
Re-run with a strict grounding prompt that bars any outside knowledge |
answer_vague |
Expand top_k to retrieve more context |
top_k escalates every attempt regardless: 2 → 5 → 8 → 10. The best-scoring attempt across all tries is returned.
| Model | Used for |
|---|---|
gemini-2.5-flash |
Answer generation, query rewriting |
gemini-2.5-pro |
Evaluation (stronger judge for faithfulness) |
gemini-embedding-001 |
Document and query embeddings (dim=3072) |
30 Wikipedia topics covering ML/NLP/AI:
Artificial intelligence, Machine learning, Deep learning, Natural language processing, Transformer, BERT, GPT, Retrieval-augmented generation, Information retrieval, Knowledge graph, Neural network, RNN, CNN, Attention mechanism, Word embedding, Semantic search, Question answering, Text summarization, NER, Sentiment analysis, Transfer learning, Fine-tuning, Reinforcement learning, GAN, Autoencoder, VAE, Prompt engineering, Zero-shot learning, Few-shot learning, Federated learning