A state-of-the-art AI Voice Assistant with speech recognition, text-to-speech, voice cloning, and real-time conversation capabilities. Features a stunning modern React frontend with real-time voice visualization.
🎉 Now with FREE Open Source Voice Services! Use Coqui TTS and Whisper locally - no API costs, complete privacy, fully offline capable.
- Speech-to-Text: OpenAI Whisper API OR free local Whisper
- Text-to-Speech: ElevenLabs API OR free local Coqui TTS with voice cloning
- Voice Cloning: Clone voices from audio samples (XTTSv2)
- AI Conversations: Multi-provider support (OpenAI, Groq, Ollama, Together, Cerebras)
- Real-time Streaming: WebSocket support for live voice interactions
- Production Ready: Rate limiting, metrics, logging
- Zero-Cost Mode: Run completely offline with open source models
- Real-time Voice Visualization - Animated waveform display
- Chat Interface - Beautiful message history with markdown support
- Voice Settings - Control pitch, speed, stability, and similarity boost
- Multi-language Support - 10+ languages
- Provider Selector - Choose your AI backend
- Dark/Light Mode - With system preference detection
- Export Conversations - Download chat history
- Premium Animations - Smooth, stunning UI with Framer Motion
- Fully Responsive - Works on desktop, tablet, and mobile
Run voice synthesis and recognition completely free using open source models:
| Feature | API Version | Open Source Version |
|---|---|---|
| TTS | ElevenLabs ($0.015/1K chars) | Coqui TTS (FREE) |
| STT | OpenAI Whisper ($0.006/min) | Whisper Local (FREE) |
| Voice Cloning | ElevenLabs | XTTSv2 (FREE) |
| Languages | 29+ | 1100+ languages |
| Privacy | Cloud processing | 100% local |
| Offline | ❌ Requires internet | ✅ Fully offline |
# 1. Run the setup script
setup-local-voice.bat # Windows
# OR
./setup-local-voice.sh # Linux/Mac
# 2. Enable in .env file
USE_LOCAL_VOICE=true
WHISPER_MODEL=base
TTS_MODEL=tts_models/multilingual/multi-dataset/xtts_v2
# 3. Start as normal
npm start📖 Full Guide: LOCAL_VOICE_SETUP.md
- Node.js 18+
- Python 3.9+ (for open source voice services)
- API Keys (optional if using local voice services):
- OpenAI API key
- ElevenLabs API key (optional, for voice cloning)
cd "AI Voice Assistant"
npm install
# Setup environment
cp .env.example .env
# Edit .env and add your API keys (or use local voice)
# Start backend server
npm run dev
# In a new terminal, start frontend
cd client
npm install
npm run devThe frontend will be available at http://localhost:5173 and the backend at http://localhost:3000.
POST /api/speech-to-text
Content-Type: multipart/form-data
audio: <audio-file>
language: enPOST /api/text-to-speech
Content-Type: application/json
{
"text": "Hello, how are you?",
"voice": "Rachel",
"stability": 0.5,
"similarity_boost": 0.75
}POST /api/chat
Content-Type: application/json
{
"messages": [
{ "role": "user", "content": "Hello!" }
],
"model": "gpt-4",
"temperature": 0.7
}GET /api/voicesGET /api/providersConnect to ws://localhost:3000 for real-time voice conversations.
Client -> Server:
audio-stream: Send audio chunkstext-message: Send text messages
Server -> Client:
session: Session ID assignmenttranscription: Transcribed textai-response: AI text responseaudio-response: AI audio response (base64)error: Error messages
| Variable | Description | Default |
|---|---|---|
USE_LOCAL_VOICE |
Enable local TTS/STT | false |
TTS_MODEL |
Coqui TTS model | tts_models/multilingual/multi-dataset/xtts_v2 |
WHISPER_MODEL |
Whisper model (tiny/base/small/medium/large/turbo) | base |
VOICE_DEVICE |
Compute device (auto/cpu/cuda) | auto |
| Variable | Description | Default |
|---|---|---|
PORT |
Server port | 3000 |
OPENAI_API_KEY |
OpenAI API key | Optional* |
ELEVENLABS_API_KEY |
ElevenLabs API key | Optional |
GROQ_API_KEY |
Groq API key | Optional |
TOGETHER_API_KEY |
Together AI key | Optional |
CEREBRAS_API_KEY |
Cerebras API key | Optional |
OLLAMA_ENABLED |
Enable Ollama local LLM | false |
OLLAMA_HOST |
Ollama server URL | http://localhost:11434 |
| Variable | Description | Default |
|---|---|---|
NODE_ENV |
Environment mode | development |
CORS_ORIGIN |
CORS origin | * |
RATE_LIMIT_WINDOW_MS |
Rate limit window | 60000 |
RATE_LIMIT_MAX_REQUESTS |
Max requests per window | 50 |
ENABLE_PROVIDER_FALLBACK |
Fallback to APIs if local fails | true |
* Required only if USE_LOCAL_VOICE=false and no other chat provider configured
AI Voice Assistant/
├── client/ # React frontend
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── Layout.jsx
│ │ │ ├── Header.jsx
│ │ │ ├── Sidebar.jsx
│ │ │ ├── ChatInterface.jsx
│ │ │ ├── VoiceRecorder.jsx
│ │ │ ├── AudioVisualizer.jsx
│ │ │ ├── VoiceSettings.jsx
│ │ │ └── ProviderSelector.jsx
│ │ ├── contexts/ # React contexts
│ │ │ ├── ThemeContext.jsx
│ │ │ └── VoiceContext.jsx
│ │ ├── styles/
│ │ ├── App.jsx
│ │ └── main.jsx
│ ├── package.json
│ ├── vite.config.js
│ └── tailwind.config.js
├── server/
│ └── index.js # Main server entry
├── tests/
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── logs/ # Log files
├── uploads/ # Temporary upload storage
├── audio-cache/ # Cached TTS audio
├── package.json
├── Dockerfile
├── docker-compose.yml
├── jest.config.js
├── .env.example
└── README.md
The frontend is built with modern React:
cd client
npm install
npm run dev- React 18 with Hooks
- Vite for fast development
- Tailwind CSS for styling
- Framer Motion for animations
- Socket.io Client for real-time communication
- Zustand for state management
cd client
npm run buildThis creates an optimized build in client/dist/.
# Build and start all services
docker-compose up -d --build
# View logs
docker-compose logs -f
# Scale to multiple instances
docker-compose up -d --scale voice-assistant=3# Install dependencies
npm ci --only=production
# Build frontend
cd client
npm ci
npm run build
cd ..
# Set environment variables
export NODE_ENV=production
export OPENAI_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key
# Start server
npm start- Health Check:
GET /health - Metrics:
GET /metrics(Prometheus format) - Logs: Check
logs/directory
# Run all tests
npm test
# Run unit tests only
npm run test:unit
# Run integration tests only
npm run test:integration
# Run frontend tests (if available)
cd client
npm test| Provider | Type | Free | Local |
|---|---|---|---|
| Ollama | Chat | ✅ | ✅ |
| Groq | Chat | ✅ | ❌ |
| Together AI | Chat | ✅ | ❌ |
| Cerebras | Chat | ✅ | ❌ |
| OpenAI | Chat, STT, TTS | ❌ | ❌ |
| Provider | Service | Free | Local | Voice Cloning |
|---|---|---|---|---|
| Coqui TTS | TTS | ✅ FREE | ✅ | ✅ XTTSv2 |
| Whisper | STT | ✅ FREE | ✅ | - |
| ElevenLabs | TTS | ✅ (limited) | ❌ | ✅ |
| OpenAI | STT, TTS | ❌ | ❌ | ❌ |
💡 Recommendation: Use local services for unlimited free usage, API providers as fallback for maximum reliability.
- All security vulnerabilities patched (0 npm audit issues)
- Rate limiting enabled by default
- Helmet.js security headers
- CORS protection
- File type validation
- Non-root Docker user
- Chrome/Edge 90+
- Firefox 88+
- Safari 14+
See DEPLOYMENT_READINESS_REPORT.md for detailed production readiness assessment.
MIT