A production-quality multi-agent voice assistant built with the OpenAI Agents SDK. Speak a question — the system routes it to the right specialist agent, reasons over it, and responds with natural voice.
- 🎙 Speak — hold the mic button or press
Spaceto ask anything - 🧠 Routes — General Agent decides which specialist handles the request
- 🔍 Researches — Research Agent searches the web and summarizes results
- 💻 Codes — Code Agent explains algorithms and solves math problems
- 🔊 Responds — answer played back as natural voice (OpenAI TTS nova)
- 💾 Remembers — full conversation history persists across sessions (SQLite)
| Layer | Technology |
|---|---|
| Agent Framework | OpenAI Agents SDK 0.11.1 |
| LLM | GPT-4o (specialists) · GPT-4o-mini (triage) |
| Speech-to-Text | OpenAI Whisper (gpt-4o-transcribe) |
| Text-to-Speech | OpenAI TTS (tts-1-hd · nova voice) |
| Memory | SQLiteSession — persists across restarts |
| Backend | FastAPI + Uvicorn |
| Frontend | Vanilla HTML · CSS · JavaScript |
User speaks
│
▼
STT — gpt-4o-transcribe
│
▼
┌──────────────────────────────────────────┐
│ General Agent │
│ • Input guardrails (jailbreak, empty) │
│ • Output guardrails (PII, length cap) │
│ • Routes by question type │
└──────────┬───────────────┬───────────────┘
│ │
Research?│ │ Code / Math?
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Research │ │ Code Agent │
│ Agent │ │ │
│ tools: │ │ tools: │
│ • web_search │ │ • calculator │
│ • summarizer │ └──────────────┘
└──────────────┘
│
▼
TTS — tts-1-hd · nova voice
│
▼
🔊 Browser plays audio
│
▼
SQLiteSession saves turn
(memory persists next run)
| Concept | File |
|---|---|
| Multi-agent handoffs | agents/general_agent.py |
Typed handoff metadata (input_type) |
agents/general_agent.py |
on_handoff callbacks |
agents/general_agent.py |
@function_tool |
tools/web_search.py · tools/summarizer.py · tools/calculator.py |
@input_guardrail |
guardrails/input_guards.py |
@output_guardrail |
guardrails/output_guards.py |
Custom VoiceWorkflowBase subclass |
workflow/session_workflow.py |
VoicePipeline + VoicePipelineConfig |
main.py (CLI) |
SQLiteSession persistent memory |
session/memory.py |
Runner.run_streamed() |
workflow/session_workflow.py |
- Python 3.10+
- OpenAI API key with GPT-4o access
- PortAudio (for CLI mode only)
# macOS
brew install portaudiogit clone https://github.com/your-username/voice-research-assistant.git
cd voice-research-assistant
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Add your OPENAI_API_KEY to .envpython server.py- Hold the 🎙 button (or press
Space) → speak → release → agent responds - Click the session badge (top right) to start a fresh conversation
python -m src.voice_research_assistant.mainPress ENTER to start/stop recording. Ctrl+C to quit.
| Type | Example |
|---|---|
| Research | "What is quantum entanglement?" |
| Current events | "What happened with OpenAI recently?" |
| Math | "What is 2 to the power of 32?" |
| Code | "Explain what a Python generator is" |
| Simple | "What can you help me with?" |
Conversation history is saved to data/conversations.db.
The agent remembers previous turns — even after you restart.
# Start a fresh session
SESSION_ID=new-session python server.py
# Reset memory entirely
rm data/conversations.dbvoice-research-assistant/
├── server.py # FastAPI server + web UI entry point
├── .env.example # Environment variable template
├── requirements.txt
├── assets/
│ └── demo.png
├── frontend/
│ ├── index.html # Web UI
│ └── static/
│ ├── app.js # Recording, API calls, playback
│ └── style.css # Dark theme
└── src/voice_research_assistant/
├── main.py # CLI entry point
├── config.py # Environment variables
├── api/
│ └── voice_handler.py # STT → Agent → TTS pipeline for web
├── audio/
│ ├── recorder.py # Push-to-talk mic capture (CLI)
│ └── player.py # Real-time audio playback (CLI)
├── agents/
│ ├── general_agent.py # Triage agent with guardrails + handoffs
│ ├── research_agent.py # Web search specialist
│ └── code_agent.py # Code and math specialist
├── tools/
│ ├── web_search.py # DuckDuckGo (no API key needed)
│ ├── summarizer.py # Condenser via gpt-4o-mini
│ └── calculator.py # Safe AST-based arithmetic
├── guardrails/
│ ├── input_guards.py # Jailbreak + empty input detection
│ └── output_guards.py # PII detection + response length cap
├── workflow/
│ └── session_workflow.py # Custom VoiceWorkflowBase + SQLiteSession
└── session/
└── memory.py # SQLiteSession factory
- LangGraph Agents — same agent concepts built with LangGraph for comparison
