A comprehensive AI-powered research system that conducts thorough investigations on any topic using structured planning, multi-source web research, and professional report generation.
# Install dependencies
pip install -r requirements.txt
# Set up your API keys in .env
cp .env.template .env
# Edit .env with your keys
# Run a research query
python main.py "Arkel.ai french company"
# With custom settings (fast research)
python main.py "Tesla sustainability initiatives" --provider cerebras --max-iter 75
# Deep analysis with reasoning mode
python main.py "Impact of quantum computing on cryptography" --reasoning --max-iter 100The Deep Research System is an autonomous AI research assistant built with CrewAI that can:
- Plan research investigations with structured TODO lists
- Search the web using advanced AI-powered search (Exa AI)
- Extract full content from specific webpages
- Analyze information from multiple sources
- Synthesize findings into comprehensive, well-formatted reports
- π― Interactive CLI - Command-line interface with customizable parameters
- π Reactive Dashboard - Live-updating display that refreshes in place (no scrolling!)
- π Status Tracking - Visual status indicators (β³ pending β π in_progress β β completed)
- π€ Multi-Tool Research - Web search, content extraction, and deep analysis
- π Structured Planning - Automatic breakdown into 10-15 actionable steps
- π Professional Reports - Markdown-formatted output with citations
- π¨ Highly Customizable - Adjust model, provider, depth, and creativity via CLI
- β‘ Provider Selection - Choose specific LLM providers (Cerebras, OpenAI, Anthropic, etc.)
- π§ Reasoning Mode - Enable extended reasoning for complex topics (up to 8192 tokens)
- π Clean Output - Third-party logs suppressed by default
- Research Agent - A unified AI agent that handles planning, research, and writing
- Research Tools - Three specialized tools for different research tasks
- Task Orchestration - CrewAI-based workflow management
- Persistent Planning - JSON-based TODO tracking system
| Tool | Purpose | Key Features |
|---|---|---|
update_research_plan |
Create and manage TODO lists | - Persistent JSON storage- Status tracking (pending/in_progress/completed)- Merge updates |
web_search_tool |
Search the web for information | - AI-powered search (Exa AI)- Domain filtering- Category filtering (company, news, linkedin, pdf, etc.)- Search types (auto/neural/deep) |
get_webpage_content |
Extract full webpage content | - Live crawling- AI summaries- Link extraction- Subpage crawling |
flowchart TD
Start([User Query]) --> Plan[Create Research Plan]
Plan --> TODO[Generate 10-15 TODO Items]
TODO --> Save[Save to .research_plan.json]
Save --> Loop{More TODOs?}
Loop -->|Yes| Select[Select Next TODO]
Select --> MarkIP[Mark as 'in_progress']
MarkIP --> Search[Web Search]
Search --> Filter{Need Specific<br/>Content?}
Filter -->|Yes| GetContent[Get Webpage Content]
Filter -->|No| Analyze[Analyze & Extract Info]
GetContent --> Analyze
Analyze --> MarkDone[Mark TODO as 'completed']
MarkDone --> Loop
Loop -->|No| Compile[Compile Final Report]
Compile --> Format[Format with Markdown]
Format --> Sources[Add Sources & Citations]
Sources --> Insights[Highlight Interesting Findings]
Insights --> Output([Comprehensive Report])
style Start fill:#e1f5ff
style Plan fill:#fff4e1
style Search fill:#f0e1ff
style GetContent fill:#f0e1ff
style Compile fill:#e1ffe1
style Output fill:#e1f5ff
When you submit a research query, the agent:
- Analyzes the query to understand intent and scope
- Breaks it down into 10-15 specific, actionable research steps
- Creates a TODO list using
update_research_plan - Saves the plan to
.research_plan.json
Example TODO Structure:
{
"explanation": "Creating research plan for Arkel.ai",
"updated_at": "2025-11-16 13:00:00",
"todos": [
{
"id": "step-1",
"status": "pending",
"content": "Research company background and history"
},
{
"id": "step-2",
"status": "pending",
"content": "Identify key products and services"
}
]
}For each TODO item, the agent:
- Marks the item as "in_progress"
- Searches the web using
web_search_tool:- Can filter by domain (e.g., only LinkedIn)
- Can filter by category (company, news, linkedin profile, pdf)
- Can choose search type (auto, neural, deep)
- Extracts content from promising URLs using
get_webpage_content:- Fetches full text content
- Gets AI-generated summaries
- Extracts related links
- Marks the item as "completed"
Tool Usage Examples:
# Search for company information
web_search_tool(
query="Arkel.ai company",
category="company",
search_type="neural"
)
# Get full content from specific URLs
get_webpage_content(
urls=["https://arkel.ai", "https://arkel.ai/about"]
)
# Search LinkedIn for key persons
web_search_tool(
query="Arkel.ai founders CEO",
include_domains=["linkedin.com"],
category="linkedin profile"
)After completing all research TODOs, the agent:
- Compiles all gathered information
- Organizes findings by topic
- Creates a structured markdown report with:
- Clear sections and headers
- Bullet points and tables
- Citations for all sources
- Interesting findings highlighted
Report Structure:
# [Report Title]
## Executive Summary
...
## Main Findings
### Topic 1
...
### Topic 2
...
## Interesting Findings
- Surprising fact 1
- Surprising fact 2
## Sources
1. [Source Title](URL)
2. [Source Title](URL)- Python 3.11+
- OpenAI API key (or compatible API via OpenRouter)
- Exa AI API key
- Clone the repository
git clone <repository-url>
cd deep-research- Install dependencies
pip install -r requirements.txt- Set up environment variables
Create a .env file:
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1 # or your OpenRouter URL
EXAAI_API_KEY=your_exa_api_keyEnvironment Variables:
The system reads configuration from .env:
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1 # or your OpenRouter URL
EXAAI_API_KEY=your_exa_api_keyRuntime Configuration:
Use command-line arguments to configure behavior without editing code:
python main.py "Your query" --model MODEL --max-iter N --temperature TSee python main.py --help for all available options.
Basic Usage:
# Use default query
python main.py
# Research a specific topic
python main.py "Your research topic here"
# Research with custom model
python main.py "Exaion French company" --model openrouter/openrouter/anthropic/claude-3.5-sonnet
# Research with custom settings
python main.py "AI in healthcare" --max-iter 100 --temperature 0.2Command-Line Arguments:
python main.py --helpAvailable options:
query- Research topic (positional argument, optional)--model- LLM model to use (default: claude-haiku-4.5)--max-iter- Maximum agent iterations (default: 50)--temperature- LLM temperature for creativity (default: 0.0)--provider- Specify provider(s) to use (e.g., cerebras, openai, anthropic)--reasoning- Enable reasoning mode for deeper analysis (default: disabled)--verbose- Enable verbose logging output (default: disabled)
Examples:
# Quick research with fewer iterations
python main.py "Tesla stock analysis" --max-iter 30
# Creative research with higher temperature
python main.py "Future of renewable energy" --temperature 0.7
# Use a more powerful model
python main.py "Quantum computing applications" \
--model openrouter/openrouter/anthropic/claude-3.5-sonnet \
--max-iter 75
# Enable verbose logging for debugging
python main.py "Research topic" --verbose
# Use specific provider (e.g., Cerebras for speed)
python main.py "AI market analysis" --provider cerebras
# Use multiple providers as fallback
python main.py "Research topic" --provider "cerebras,openai"
# Enable reasoning mode for deeper analysis
python main.py "Complex research topic" --reasoning
# Combined: complex research with reasoning, custom provider, and more iterations
python main.py "Quantum computing applications in drug discovery" \
--reasoning \
--provider anthropic \
--max-iter 75 \
--temperature 0.2The system will:
- Clean up any previous research plan (each session starts fresh)
- Create a new research plan
- Display TODO list updates in real-time as research progresses
- Show status changes (β³ pending β π in_progress β β completed)
- Generate and display a comprehensive final report
Real-Time Monitoring:
The system displays a reactive dashboard that updates in place (no scrolling):
π Research Plan Monitor
============================================================
β±οΈ Updated: 2025-11-16 13:30:45
π Starting research on company background
------------------------------------------------------------
π Progress: 10 tasks total
π in_progress: 1
β³ pending: 7
β
completed: 2
π Current Plan:
1. β
[completed ] Research company background and history
2. β
[completed ] Identify key products and services
3. π [in_progress ] Find information about leadership team
4. β³ [pending ] Analyze market position
...
============================================================
The display updates in place every 2 seconds - no scrolling, just live status changes! Tasks automatically move through: β³ β π β β
Technical Note: The reactive display uses ANSI escape codes (
\033[s,\033[u,\033[J) to save/restore cursor position and clear content, creating a dashboard-like experience. This works in all modern terminals (bash, zsh, PowerShell, etc.).
The system uses a hybrid approach with both logging and print statements:
- Print statements: User-facing output (banners, progress, reports)
- Logging system: Diagnostic information, errors, system events
Third-party library logs (LiteLLM, httpx, OpenAI) are suppressed by default to keep output clean.
Enable verbose logging for debugging:
python main.py "Research topic" --verboseThis will show detailed logs including third-party library information:
2025-11-16 13:30:00 - deep-research - INFO - Deep Research System initialized - Query: 'Research topic'
2025-11-16 13:30:00 - deep-research - INFO - Configuration: model=claude-haiku-4.5, max_iter=50, temperature=0.0
2025-11-16 13:30:01 - deep-research - INFO - Starting real-time plan monitoring
2025-11-16 13:30:01 - deep-research - INFO - Plan monitoring thread started
2025-11-16 13:30:02 - deep-research - INFO - Starting research execution
2025-11-16 13:30:02 - LiteLLM - INFO - LiteLLM completion() model= openrouter/anthropic/claude-haiku-4.5
...
Redirect logs separately:
# Save report to file, see logs in terminal
python main.py "topic" > report.md
# Save logs to file, see report in terminal
python main.py "topic" 2> logs.txt
# Save both separately
python main.py "topic" > report.md 2> logs.txtDomain-Specific Search:
web_search_tool(
query="AI research papers",
include_domains=["arxiv.org", "paperswithcode.com"]
)Category Filtering:
web_search_tool(
query="company financial report",
category="pdf"
)Search Algorithm Selection:
auto: Let Exa choose the best algorithmneural: AI-powered semantic searchdeep: Deep web crawling for comprehensive results
Create Initial Plan:
update_research_plan(
todos=[
{"id": "step-1", "status": "pending", "content": "Research X"},
{"id": "step-2", "status": "pending", "content": "Research Y"},
],
explanation="Creating initial plan"
)Update Status:
# Mark as in progress
update_research_plan(
todos=[{"id": "step-1", "status": "in_progress"}],
explanation="Starting step 1"
)
# Mark as completed
update_research_plan(
todos=[{"id": "step-1", "status": "completed"}],
explanation="Completed step 1"
)- The agent gathers information from multiple sources
- Cross-references findings for accuracy
- Documents contradictions when found
- All information is cited with source URLs
- Sources include titles, authors, and publication dates
- Easy to verify and follow up on any claim
- TODO list shows exactly what's being researched
- Real-time status updates
- Clear separation of findings by research phase
deep-research/
βββ main.py # Main entry point
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (not in git)
βββ .research_plan.json # Current research plan (auto-generated, cleaned each run)
βββ README.md # This file
βββ app/
β βββ __init__.py
β βββ config.py # Configuration and logging
βββ tools/
βββ __init__.py
βββ plan.py # Research planning tool
βββ web_search.py # Web search and content tools
Control research depth via the --max-iter parameter:
# Quick research (fewer steps)
python main.py "Your topic" --max-iter 25
# Standard research
python main.py "Your topic" --max-iter 50
# Deep research (more thorough)
python main.py "Your topic" --max-iter 100The system generates 10-15 TODO items and the max-iter controls how many research actions can be taken.
Use different models based on your needs:
# Fast and cost-effective (default)
python main.py "Your topic" --model openrouter/openrouter/anthropic/claude-haiku-4.5
# More capable and thorough
python main.py "Your topic" --model openrouter/openrouter/anthropic/claude-3.5-sonnet
# Other supported models
python main.py "Your topic" --model openrouter/openrouter/openai/gpt-4Control the LLM's creativity with temperature:
# Deterministic (factual research)
python main.py "Your topic" --temperature 0.0
# Balanced
python main.py "Your topic" --temperature 0.3
# More creative (exploratory research)
python main.py "Your topic" --temperature 0.7Control which LLM provider to use (when using OpenRouter or similar services):
# Automatic provider selection (default)
python main.py "Your topic"
# Use Cerebras for ultra-fast responses
python main.py "Your topic" --provider cerebras
# Use specific provider
python main.py "Your topic" --provider openai
python main.py "Your topic" --provider anthropic
# Multiple providers as fallback (tries in order)
python main.py "Your topic" --provider "cerebras,openai,anthropic"Common Providers:
cerebras- Ultra-fast inference, great for quick researchopenai- GPT models, balanced performanceanthropic- Claude models, excellent reasoningtogether- Open source models- Leave empty for automatic selection based on model
Enable extended reasoning for more thorough analysis (requires compatible models):
# Standard mode (faster, direct responses)
python main.py "Your topic"
# Reasoning mode (deeper analysis, step-by-step thinking)
python main.py "Your topic" --reasoningWhen to use reasoning mode:
- β Complex research topics requiring deep analysis
- β Technical or scientific subjects
- β Multi-faceted questions with nuanced answers
- β When accuracy is more important than speed
Note: Reasoning mode allocates up to 8192 tokens for internal reasoning, which helps the model think through problems more thoroughly but may increase processing time.
- Clear Queries: Provide specific, well-defined research topics
- Domain Hints: Include relevant domains or contexts in your query
- Review Progress: Monitor
.research_plan.jsonduring long research sessions - Verify Sources: Always check the sources cited in the final report
- Iterative Refinement: Run multiple research sessions for complex topics
Command:
python main.py "Arkel.ai french company"Console Output:
============================================================
π Deep Research System
============================================================
Query: Arkel.ai french company
Model: openrouter/openrouter/anthropic/claude-haiku-4.5
Max Iterations: 50
Provider: cerebras
============================================================
π Research Plan Monitor
============================================================
π Starting Deep Research...
============================================================
β±οΈ Updated: 2025-11-16 13:30:15
π Creating initial research plan for Arkel.ai
------------------------------------------------------------
π Progress: 12 tasks total
β³ pending: 12
π Current Plan:
1. β³ [pending ] Research company background and history
2. β³ [pending ] Identify key products and services
3. β³ [pending ] Find key leadership and team
... (updates in real-time as research progresses)
============================================================
... (plan updates continue as tasks are completed)
β±οΈ Updated: 2025-11-16 13:35:42
π Completed all research tasks
------------------------------------------------------------
π Progress: 12 tasks total
β
completed: 12
============================================================
β¨ RESEARCH COMPLETED
============================================================
π Final Report:
# Arkel.ai: French AI Company Analysis
## Executive Summary
...
## Company Background
...
## Interesting Findings
...
## Sources
1. [Arkel.ai Official Website](https://arkel.ai)
2. ...
============================================================
β
Research completed at 2025-11-16 13:36:00
============================================================
Key Features:
- Real-time plan updates showing progress every 2 seconds
- Status emojis (β³ pending, π in_progress, β completed)
- Progress summary with task counts by status
- Detailed task list showing current state of all TODOs
- Final comprehensive report displayed at the end
Contributions are welcome! Areas for improvement:
- Additional research tools
- Better error handling
- More sophisticated planning algorithms
- Export formats (PDF, HTML)
- Research templates for common use cases