Simultaneous Voice Chat with ChatGPT, Claude, Grok, Perplexity & Gemini
A thread-safe Streamlit application that records your voice and streams responses from all 5 major AI models simultaneously for instant comparison.
- π€ Voice Recording: High-quality 16kHz audio capture with background threading
- π€ Auto-Transcription: OpenAI Whisper API integration with manual editing support
- π Multi-Model Streaming: Simultaneous responses from ChatGPT, Claude, Grok, Perplexity & Gemini
- π Real-Time Comparison: Side-by-side response streaming in tabbed interface
- πΎ Conversation Logging: JSON logs for each model with timestamps
- π§ Thread-Safe Design: Non-blocking UI with proper concurrency handling
- ποΈ Flexible Configuration: WebUI + API support for each model
- β‘ Smart Auto-Mode: Automatic transcription and response triggering
# Clone the repository
git clone https://github.com/myaichat/speakstream-ai.git
cd speakstream-ai
# Install dependencies
pip install -r requirements.txt
# Configure API keys
cp .env.template .env
# Edit .env with your API keys
# Optional: Start Chrome for WebUI features
chrome --remote-debugging-port=9222
# Run the application
streamlit run gemini_model_chat_app.py- Python 3.8+
- Streamlit
- OpenAI API key (for Whisper transcription)
- API keys for desired AI models (ChatGPT, Claude, Grok, Perplexity, Gemini)
- Chrome browser (for WebUI features)
- Microphone access
Each AI model has its own dedicated handler module:
π chat_handlers/
βββ chatgpt_handler.py # ChatGPT WebUI + API
βββ claude_handler.py # Claude WebUI + API
βββ grok_handler.py # Grok WebUI + API
βββ perplexity_handler.py # Perplexity WebUI + API
βββ gemini_handler.py # Gemini WebUI + API
- Audio Recording: Background thread captures voice without blocking UI
- Transcription: OpenAI Whisper converts speech to text
- Concurrent Streaming: All enabled models process simultaneously
- Centralized State: 200+ session variables manage everything
- Thread Safety:
add_script_run_ctxenables safe UI updates from background threads
- Enable Models: Check WebUI/API boxes for models you want to use
- Set Auto-Mode: Enable auto-transcribe and auto-response for seamless experience
- Record: Press
βΆοΈ Start β speak β press βΉοΈ Stop - Compare: Watch responses stream into separate tabs
- Refine: Edit transcription and press Ctrl+Enter to re-query
- Smart Auto-Triggering: Some models start immediately when enabled
- Emergency Stop: Stop all streaming with one button
- Manual Transcription Editing: Refine voice-to-text before querying
- Conversation History: JSON logs for each model and session
Create a .env file with your API keys:
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_claude_key
XAI_API_KEY=your_grok_key
PERPLEXITY_API_KEY=your_perplexity_key
GOOGLE_API_KEY=your_gemini_keyRATE, CH = 16_000, 1 # 16kHz mono for optimal voice recognition
OUT_DIR = "recordings" # Timestamped WAV files
TIMEOUT_SEC = 60 # Max recording lengthπ speakstream-ai/
βββ gemini_model_chat_app.py # Main multi-model app
βββ simple_chat_app.py # Simplified single-model version
βββ .env.template # API key template
βββ requirements.txt # Python dependencies
βββ π chat_handlers/ # AI model handlers
β βββ chatgpt_handler.py
β βββ claude_handler.py
β βββ grok_handler.py
β βββ perplexity_handler.py
β βββ gemini_handler.py
βββ π include/ # Core utilities
β βββ transcribe.py # Whisper integration
βββ π logs/ # Conversation logs (auto-created)
βββ π recordings/ # Audio files (auto-created)
βββ π docs/ # Documentation
Want to add a sixth model? Follow this pattern:
# chat_handlers/your_model_handler.py
def start_concurrent_streaming(question):
"""Start WebUI and/or API streams"""
# Implementation here
def render_your_model_responses():
"""Display responses in two columns"""
# UI rendering here
def handle_concurrent_streaming(question, stream_type):
"""Background streaming logic"""
# Streaming implementation here
def handle_stopped_streaming():
"""Cleanup on stop"""
# Cleanup logic here# Add to main app initialization
if "your_model_concurrent_streaming_active" not in st.session_state:
st.session_state.your_model_concurrent_streaming_active = False
# ... add all necessary state variables# Add to model selection, tabs, and streaming handler
# Follow existing patterns for ChatGPT, Claude, etc.- Batch Token Updates: Update UI every ~20 tokens instead of per-token
- Selective Model Enabling: Only enable models you're actively comparing
- Local Whisper: Use local whisper-tiny model for heavy transcription workloads
- Chrome Debugging: Use
--remote-debugging-port=9222for free WebUI response scraping
| Problem | Solution |
|---|---|
RuntimeError: Cannot call st.* from a thread |
Ensure add_script_run_ctx(thread) is called before starting threads |
| Streaming text doesn't appear | Check that st.rerun() is called in the central streaming loop |
| No audio captured | Check microphone permissions and sounddevice installation |
| Model not responding | Verify API keys in .env and check handler implementation |
- π Streaming Audio Upload: Direct S3 upload for long conversations
- πΌοΈ Multimodal Support: Screenshot-to-prompt for visual questions
- π¬ Response Scoring: A/B testing widget with CSV export
- π Analytics Dashboard: Latency and token-per-second metrics
- π― Local LLM Support: Ollama, LM Studio integration
- π Text-to-Speech: Hear responses back with voice synthesis
We welcome contributions! Here's how you can help:
- π Star the repository to show your support
- π Report issues if you encounter problems
- π‘ Suggest features for new AI models or capabilities
- π€ Submit pull requests for improvements
- π Improve documentation and tutorials
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and test thoroughly
- Submit a pull request with a clear description
This project is open source and available under the MIT License.
- OpenAI for Whisper API
- Streamlit for the amazing web framework
- All AI model providers for their APIs
- The open source community for inspiration and feedback
- π Documentation: Check the
/docsfolder for detailed guides - π Issues: Report bugs via GitHub Issues
- π¬ Discussions: Join conversations in GitHub Discussions
- π§ Contact: Reach out for collaboration opportunities
SpeakStream AI - Transforming voice into insights across multiple AI models simultaneously. Whether you're researching AI model differences, building comparison tools, or exploring multi-model conversations, this solution provides a solid foundation that's both powerful and extensible.
Ready to start? Clone the repo, configure your API keys, and begin exploring the fascinating world of multi-model AI conversations! π
Tags: streamlit voice-ai chatgpt claude grok perplexity gemini python threading concurrent whisper ai-comparison
