Advanced document analysis system using OpenAI Agents SDK with autonomous multi-agent orchestration, RAG pipeline, and interactive UI.
This system implements a sophisticated multi-agent architecture using OpenAI Agents SDK (v0.6.1) for intelligent PDF document analysis. It features autonomous intent detection, retrieval-augmented generation (RAG), specialized reasoning agents, and an interactive Streamlit interface with citation highlighting.
User Query β Planner Agent (Intent Detection)
β
Appropriate Agent Chain
β
RAG Agent (Retrieval + Generation)
β
Specialized Reasoning Agent
β
Response with Cited Evidence
- Planner Agent - Autonomous orchestrator using handoffs
- RAG Agent - Retrieval-augmented generation with FAISS
- Summarization Agent - Full-document summarization
- Comparator Agent - Cross-document comparison analysis
- Timeline Builder Agent - Chronological event organization
- Aggregator Agent - Multi-source information synthesis
| Component | Technology |
|---|---|
| Agent Framework | OpenAI Agents SDK v0.6.1 |
| LLM | OpenAI (provider-agnostic) |
| Vector Database | FAISS (IndexFlatIP) |
| Embeddings | sentence-transformers (384-dim) |
| PDF Processing | pdfplumber + PyMuPDF |
| UI Framework | Streamlit |
β
Autonomous Intent Detection - No manual mode selection
β
RAG Pipeline - Semantic search with grounded responses
β
Multi-Document Analysis - Cross-document retrieval
β
Citation Tracking - Every answer includes ranked evidence
β
Interactive PDF Viewer - Click-to-navigate with highlighting
β
Agent Orchestration - Dynamic agent chaining
β
Tool Calling - Agents call Python functions (@function_tool)
β
Autonomous Handoffs - LLM-driven delegation (no manual routing)
β
Global State Management - Tools access shared Vector Store
β
Evidence Highlighting - Yellow highlights on cited passages
β
Execution Tracing - Transparent agent workflow via Runner logs
- Python 3.9+
- OpenAI API key (Get one here) OR a Gemini API Key
# Clone repository
git clone <repository-url>
cd pdf_agent_system
# Install dependencies
pip install -r requirements.txt# Copy environment template
cp .env.example .env
# Edit .env and add your API key
OPENAI_API_KEY=your_key_herestreamlit run app.pyAccess the application at: http://localhost:8501
pdf_agent_system/
βββ agents/
β βββ __init__.py
β βββ tools.py # Standalone tools for SDK agents
β βββ rag_agent.py # RAG Agent definition
β βββ summarization_agent.py # Summarization Agent definition
β βββ specialized_agents.py # Reasoning Agents (Comparator, Timeline, etc.)
β βββ planner_agent.py # Orchestrator with Handoffs
βββ utils/
β βββ __init__.py
β βββ state.py # Singleton for tool access
β βββ pdf_processor.py # PDF extraction + chunking
β βββ vector_store.py # FAISS vector database
βββ config/
β βββ __init__.py
β βββ settings.py # Configuration
βββ app.py # Streamlit UI
βββ requirements.txt # Dependencies
βββ .env.example # Configuration template
βββ .gitignore # Git ignore rules
βββ README.md # This file
We use the native Agent and Runner primitives:
from agents import Agent, Runner
# Agents invoke tools and hand off to others
result = Runner.run_sync(planner_agent, user_query)
print(result.final_output)Tools are defined using the @function_tool decorator and access shared state:
@function_tool
def retrieve_documents(query: str):
"""Retrieve relevant chunks"""
return global_state.vector_store.search(query)The Planner Agent uses instructions and the handoffs list to route dynamically:
planner_agent = Agent(
name="Planner",
instructions="Route queries to the correct specialist...",
handoffs=[rag_agent, summarization_agent, comparator_agent]
)User: "What are the main findings in the research paper?"
System Flow:
- Planner delegates to RAG Agent
- RAG Agent calls 'retrieve_documents' tool
- Agent generates answer with citations
Output:
Answer: "The research identifies three main findings: [1] X, [2] Y, [3] Z"
User: "Compare the methodologies across these papers"
System Flow:
- Planner delegates to RAG Agent
- RAG Agent retrieves methodology sections
- RAG Agent hands off to Comparator Agent
- Comparator Agent analyzes differences
Output:
Structured comparison with specific examples
# Required
OPENAI_API_KEY=sk-your-key-here
# Optional (defaults shown)
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
TOP_K_RETRIEVAL=5| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| gpt-4o-mini | $0.150 | $0.600 |
| gpt-4o | $2.50 | $10.00 |
- Per Query: ~2,000 input + 500 output tokens = ~$0.0006
- Per Session: ~10 queries = ~$0.006
- OpenAI - Agents SDK framework
- Facebook Research - FAISS vector search
- Sentence Transformers - Embedding models
- Streamlit - Interactive UI framework
β¨ Built with OpenAI Agents SDK v0.6.1 | Multi-Agent Architecture β¨