AI Voice Assistant

A state-of-the-art AI Voice Assistant with speech recognition, text-to-speech, voice cloning, and real-time conversation capabilities. Features a stunning modern React frontend with real-time voice visualization.

🎉 Now with FREE Open Source Voice Services! Use Coqui TTS and Whisper locally - no API costs, complete privacy, fully offline capable.

Features

Backend

Speech-to-Text: OpenAI Whisper API OR free local Whisper
Text-to-Speech: ElevenLabs API OR free local Coqui TTS with voice cloning
Voice Cloning: Clone voices from audio samples (XTTSv2)
AI Conversations: Multi-provider support (OpenAI, Groq, Ollama, Together, Cerebras)
Real-time Streaming: WebSocket support for live voice interactions
Production Ready: Rate limiting, metrics, logging
Zero-Cost Mode: Run completely offline with open source models

Frontend

Real-time Voice Visualization - Animated waveform display
Chat Interface - Beautiful message history with markdown support
Voice Settings - Control pitch, speed, stability, and similarity boost
Multi-language Support - 10+ languages
Provider Selector - Choose your AI backend
Dark/Light Mode - With system preference detection
Export Conversations - Download chat history
Premium Animations - Smooth, stunning UI with Framer Motion
Fully Responsive - Works on desktop, tablet, and mobile

🆓 Free Open Source Voice Services (New!)

Run voice synthesis and recognition completely free using open source models:

Feature	API Version	Open Source Version
TTS	ElevenLabs ($0.015/1K chars)	Coqui TTS (FREE)
STT	OpenAI Whisper ($0.006/min)	Whisper Local (FREE)
Voice Cloning	ElevenLabs	XTTSv2 (FREE)
Languages	29+	1100+ languages
Privacy	Cloud processing	100% local
Offline	❌ Requires internet	✅ Fully offline

Quick Setup (Open Source)

# 1. Run the setup script
setup-local-voice.bat  # Windows
# OR
./setup-local-voice.sh # Linux/Mac

# 2. Enable in .env file
USE_LOCAL_VOICE=true
WHISPER_MODEL=base
TTS_MODEL=tts_models/multilingual/multi-dataset/xtts_v2

# 3. Start as normal
npm start

📖 Full Guide: LOCAL_VOICE_SETUP.md

Quick Start (Traditional)

Prerequisites

Node.js 18+
Python 3.9+ (for open source voice services)
API Keys (optional if using local voice services):
- OpenAI API key
- ElevenLabs API key (optional, for voice cloning)

Installation

cd "AI Voice Assistant"
npm install

# Setup environment
cp .env.example .env
# Edit .env and add your API keys (or use local voice)

# Start backend server
npm run dev

# In a new terminal, start frontend
cd client
npm install
npm run dev

The frontend will be available at http://localhost:5173 and the backend at http://localhost:3000.

API Documentation

Speech-to-Text

POST /api/speech-to-text
Content-Type: multipart/form-data

audio: <audio-file>
language: en

Text-to-Speech

POST /api/text-to-speech
Content-Type: application/json

{
  "text": "Hello, how are you?",
  "voice": "Rachel",
  "stability": 0.5,
  "similarity_boost": 0.75
}

AI Chat

POST /api/chat
Content-Type: application/json

{
  "messages": [
    { "role": "user", "content": "Hello!" }
  ],
  "model": "gpt-4",
  "temperature": 0.7
}

Get Voices

GET /api/voices

Get Providers

GET /api/providers

WebSocket Real-time API

Connect to ws://localhost:3000 for real-time voice conversations.

Events

Client -> Server:

audio-stream: Send audio chunks
text-message: Send text messages

Server -> Client:

session: Session ID assignment
transcription: Transcribed text
ai-response: AI text response
audio-response: AI audio response (base64)
error: Error messages

Environment Variables

Local Voice Services (Open Source)

Variable	Description	Default
`USE_LOCAL_VOICE`	Enable local TTS/STT	false
`TTS_MODEL`	Coqui TTS model	tts_models/multilingual/multi-dataset/xtts_v2
`WHISPER_MODEL`	Whisper model (tiny/base/small/medium/large/turbo)	base
`VOICE_DEVICE`	Compute device (auto/cpu/cuda)	auto

API Providers (Optional)

Variable	Description	Default
`PORT`	Server port	3000
`OPENAI_API_KEY`	OpenAI API key	Optional*
`ELEVENLABS_API_KEY`	ElevenLabs API key	Optional
`GROQ_API_KEY`	Groq API key	Optional
`TOGETHER_API_KEY`	Together AI key	Optional
`CEREBRAS_API_KEY`	Cerebras API key	Optional
`OLLAMA_ENABLED`	Enable Ollama local LLM	false
`OLLAMA_HOST`	Ollama server URL	http://localhost:11434

General Settings

Variable	Description	Default
`NODE_ENV`	Environment mode	development
`CORS_ORIGIN`	CORS origin	*
`RATE_LIMIT_WINDOW_MS`	Rate limit window	60000
`RATE_LIMIT_MAX_REQUESTS`	Max requests per window	50
`ENABLE_PROVIDER_FALLBACK`	Fallback to APIs if local fails	true

* Required only if USE_LOCAL_VOICE=false and no other chat provider configured

Project Structure

AI Voice Assistant/
├── client/                   # React frontend
│   ├── src/
│   │   ├── components/       # React components
│   │   │   ├── Layout.jsx
│   │   │   ├── Header.jsx
│   │   │   ├── Sidebar.jsx
│   │   │   ├── ChatInterface.jsx
│   │   │   ├── VoiceRecorder.jsx
│   │   │   ├── AudioVisualizer.jsx
│   │   │   ├── VoiceSettings.jsx
│   │   │   └── ProviderSelector.jsx
│   │   ├── contexts/         # React contexts
│   │   │   ├── ThemeContext.jsx
│   │   │   └── VoiceContext.jsx
│   │   ├── styles/
│   │   ├── App.jsx
│   │   └── main.jsx
│   ├── package.json
│   ├── vite.config.js
│   └── tailwind.config.js
├── server/
│   └── index.js              # Main server entry
├── tests/
│   ├── unit/                 # Unit tests
│   └── integration/          # Integration tests
├── logs/                     # Log files
├── uploads/                  # Temporary upload storage
├── audio-cache/              # Cached TTS audio
├── package.json
├── Dockerfile
├── docker-compose.yml
├── jest.config.js
├── .env.example
└── README.md

Frontend Development

The frontend is built with modern React:

cd client
npm install
npm run dev

Tech Stack

React 18 with Hooks
Vite for fast development
Tailwind CSS for styling
Framer Motion for animations
Socket.io Client for real-time communication
Zustand for state management

Building for Production

cd client
npm run build

This creates an optimized build in client/dist/.

Production Deployment

Docker Deployment

# Build and start all services
docker-compose up -d --build

# View logs
docker-compose logs -f

# Scale to multiple instances
docker-compose up -d --scale voice-assistant=3

Manual Deployment

# Install dependencies
npm ci --only=production

# Build frontend
cd client
npm ci
npm run build
cd ..

# Set environment variables
export NODE_ENV=production
export OPENAI_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key

# Start server
npm start

Monitoring

Health Check: GET /health
Metrics: GET /metrics (Prometheus format)
Logs: Check logs/ directory

Testing

# Run all tests
npm test

# Run unit tests only
npm run test:unit

# Run integration tests only
npm run test:integration

# Run frontend tests (if available)
cd client
npm test

Supported AI Providers

Chat Providers

Provider	Type	Free	Local
Ollama	Chat	✅	✅
Groq	Chat	✅	❌
Together AI	Chat	✅	❌
Cerebras	Chat	✅	❌
OpenAI	Chat, STT, TTS	❌	❌

Voice Providers

Provider	Service	Free	Local	Voice Cloning
Coqui TTS	TTS	✅ FREE	✅	✅ XTTSv2
Whisper	STT	✅ FREE	✅	-
ElevenLabs	TTS	✅ (limited)	❌	✅
OpenAI	STT, TTS	❌	❌	❌

💡 Recommendation: Use local services for unlimited free usage, API providers as fallback for maximum reliability.

Security

All security vulnerabilities patched (0 npm audit issues)
Rate limiting enabled by default
Helmet.js security headers
CORS protection
File type validation
Non-root Docker user

Browser Support

Chrome/Edge 90+
Firefox 88+
Safari 14+

Documentation

See DEPLOYMENT_READINESS_REPORT.md for detailed production readiness assessment.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
client		client
server		server
services		services
tests		tests
.env.example		.env.example
.gitignore		.gitignore
DEPLOYMENT_READINESS_REPORT.md		DEPLOYMENT_READINESS_REPORT.md
Dockerfile		Dockerfile
LOCAL_VOICE_SETUP.md		LOCAL_VOICE_SETUP.md
OPEN_SOURCE_ALTERNATIVES.md		OPEN_SOURCE_ALTERNATIVES.md
QUICK_START.md		QUICK_START.md
README.md		README.md
docker-compose.yml		docker-compose.yml
jest.config.js		jest.config.js
package.json		package.json
setup-local-voice.bat		setup-local-voice.bat
setup-local-voice.sh		setup-local-voice.sh

Folders and files

Latest commit

History

Repository files navigation

AI Voice Assistant

Features

Backend

Frontend

🆓 Free Open Source Voice Services (New!)

Quick Setup (Open Source)

Quick Start (Traditional)

Prerequisites

Installation

API Documentation

Speech-to-Text

Text-to-Speech

AI Chat

Get Voices

Get Providers

WebSocket Real-time API

Events

Environment Variables

Local Voice Services (Open Source)

API Providers (Optional)

General Settings

Project Structure

Frontend Development

Tech Stack

Building for Production

Production Deployment

Docker Deployment

Manual Deployment

Monitoring

Testing

Supported AI Providers

Chat Providers

Voice Providers

Security

Browser Support

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages