Skip to content

seraphonixstudios/AI-Voice-Assistant

Repository files navigation

AI Voice Assistant

A state-of-the-art AI Voice Assistant with speech recognition, text-to-speech, voice cloning, and real-time conversation capabilities. Features a stunning modern React frontend with real-time voice visualization.

🎉 Now with FREE Open Source Voice Services! Use Coqui TTS and Whisper locally - no API costs, complete privacy, fully offline capable.

Voice Assistant Preview

Features

Backend

  • Speech-to-Text: OpenAI Whisper API OR free local Whisper
  • Text-to-Speech: ElevenLabs API OR free local Coqui TTS with voice cloning
  • Voice Cloning: Clone voices from audio samples (XTTSv2)
  • AI Conversations: Multi-provider support (OpenAI, Groq, Ollama, Together, Cerebras)
  • Real-time Streaming: WebSocket support for live voice interactions
  • Production Ready: Rate limiting, metrics, logging
  • Zero-Cost Mode: Run completely offline with open source models

Frontend

  • Real-time Voice Visualization - Animated waveform display
  • Chat Interface - Beautiful message history with markdown support
  • Voice Settings - Control pitch, speed, stability, and similarity boost
  • Multi-language Support - 10+ languages
  • Provider Selector - Choose your AI backend
  • Dark/Light Mode - With system preference detection
  • Export Conversations - Download chat history
  • Premium Animations - Smooth, stunning UI with Framer Motion
  • Fully Responsive - Works on desktop, tablet, and mobile

🆓 Free Open Source Voice Services (New!)

Run voice synthesis and recognition completely free using open source models:

Feature API Version Open Source Version
TTS ElevenLabs ($0.015/1K chars) Coqui TTS (FREE)
STT OpenAI Whisper ($0.006/min) Whisper Local (FREE)
Voice Cloning ElevenLabs XTTSv2 (FREE)
Languages 29+ 1100+ languages
Privacy Cloud processing 100% local
Offline ❌ Requires internet Fully offline

Quick Setup (Open Source)

# 1. Run the setup script
setup-local-voice.bat  # Windows
# OR
./setup-local-voice.sh # Linux/Mac

# 2. Enable in .env file
USE_LOCAL_VOICE=true
WHISPER_MODEL=base
TTS_MODEL=tts_models/multilingual/multi-dataset/xtts_v2

# 3. Start as normal
npm start

📖 Full Guide: LOCAL_VOICE_SETUP.md

Quick Start (Traditional)

Prerequisites

  • Node.js 18+
  • Python 3.9+ (for open source voice services)
  • API Keys (optional if using local voice services):
    • OpenAI API key
    • ElevenLabs API key (optional, for voice cloning)

Installation

cd "AI Voice Assistant"
npm install

# Setup environment
cp .env.example .env
# Edit .env and add your API keys (or use local voice)

# Start backend server
npm run dev

# In a new terminal, start frontend
cd client
npm install
npm run dev

The frontend will be available at http://localhost:5173 and the backend at http://localhost:3000.

API Documentation

Speech-to-Text

POST /api/speech-to-text
Content-Type: multipart/form-data

audio: <audio-file>
language: en

Text-to-Speech

POST /api/text-to-speech
Content-Type: application/json

{
  "text": "Hello, how are you?",
  "voice": "Rachel",
  "stability": 0.5,
  "similarity_boost": 0.75
}

AI Chat

POST /api/chat
Content-Type: application/json

{
  "messages": [
    { "role": "user", "content": "Hello!" }
  ],
  "model": "gpt-4",
  "temperature": 0.7
}

Get Voices

GET /api/voices

Get Providers

GET /api/providers

WebSocket Real-time API

Connect to ws://localhost:3000 for real-time voice conversations.

Events

Client -> Server:

  • audio-stream: Send audio chunks
  • text-message: Send text messages

Server -> Client:

  • session: Session ID assignment
  • transcription: Transcribed text
  • ai-response: AI text response
  • audio-response: AI audio response (base64)
  • error: Error messages

Environment Variables

Local Voice Services (Open Source)

Variable Description Default
USE_LOCAL_VOICE Enable local TTS/STT false
TTS_MODEL Coqui TTS model tts_models/multilingual/multi-dataset/xtts_v2
WHISPER_MODEL Whisper model (tiny/base/small/medium/large/turbo) base
VOICE_DEVICE Compute device (auto/cpu/cuda) auto

API Providers (Optional)

Variable Description Default
PORT Server port 3000
OPENAI_API_KEY OpenAI API key Optional*
ELEVENLABS_API_KEY ElevenLabs API key Optional
GROQ_API_KEY Groq API key Optional
TOGETHER_API_KEY Together AI key Optional
CEREBRAS_API_KEY Cerebras API key Optional
OLLAMA_ENABLED Enable Ollama local LLM false
OLLAMA_HOST Ollama server URL http://localhost:11434

General Settings

Variable Description Default
NODE_ENV Environment mode development
CORS_ORIGIN CORS origin *
RATE_LIMIT_WINDOW_MS Rate limit window 60000
RATE_LIMIT_MAX_REQUESTS Max requests per window 50
ENABLE_PROVIDER_FALLBACK Fallback to APIs if local fails true

* Required only if USE_LOCAL_VOICE=false and no other chat provider configured

Project Structure

AI Voice Assistant/
├── client/                   # React frontend
│   ├── src/
│   │   ├── components/       # React components
│   │   │   ├── Layout.jsx
│   │   │   ├── Header.jsx
│   │   │   ├── Sidebar.jsx
│   │   │   ├── ChatInterface.jsx
│   │   │   ├── VoiceRecorder.jsx
│   │   │   ├── AudioVisualizer.jsx
│   │   │   ├── VoiceSettings.jsx
│   │   │   └── ProviderSelector.jsx
│   │   ├── contexts/         # React contexts
│   │   │   ├── ThemeContext.jsx
│   │   │   └── VoiceContext.jsx
│   │   ├── styles/
│   │   ├── App.jsx
│   │   └── main.jsx
│   ├── package.json
│   ├── vite.config.js
│   └── tailwind.config.js
├── server/
│   └── index.js              # Main server entry
├── tests/
│   ├── unit/                 # Unit tests
│   └── integration/          # Integration tests
├── logs/                     # Log files
├── uploads/                  # Temporary upload storage
├── audio-cache/              # Cached TTS audio
├── package.json
├── Dockerfile
├── docker-compose.yml
├── jest.config.js
├── .env.example
└── README.md

Frontend Development

The frontend is built with modern React:

cd client
npm install
npm run dev

Tech Stack

  • React 18 with Hooks
  • Vite for fast development
  • Tailwind CSS for styling
  • Framer Motion for animations
  • Socket.io Client for real-time communication
  • Zustand for state management

Building for Production

cd client
npm run build

This creates an optimized build in client/dist/.

Production Deployment

Docker Deployment

# Build and start all services
docker-compose up -d --build

# View logs
docker-compose logs -f

# Scale to multiple instances
docker-compose up -d --scale voice-assistant=3

Manual Deployment

# Install dependencies
npm ci --only=production

# Build frontend
cd client
npm ci
npm run build
cd ..

# Set environment variables
export NODE_ENV=production
export OPENAI_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key

# Start server
npm start

Monitoring

  • Health Check: GET /health
  • Metrics: GET /metrics (Prometheus format)
  • Logs: Check logs/ directory

Testing

# Run all tests
npm test

# Run unit tests only
npm run test:unit

# Run integration tests only
npm run test:integration

# Run frontend tests (if available)
cd client
npm test

Supported AI Providers

Chat Providers

Provider Type Free Local
Ollama Chat
Groq Chat
Together AI Chat
Cerebras Chat
OpenAI Chat, STT, TTS

Voice Providers

Provider Service Free Local Voice Cloning
Coqui TTS TTS FREE ✅ XTTSv2
Whisper STT FREE -
ElevenLabs TTS ✅ (limited)
OpenAI STT, TTS

💡 Recommendation: Use local services for unlimited free usage, API providers as fallback for maximum reliability.

Security

  • All security vulnerabilities patched (0 npm audit issues)
  • Rate limiting enabled by default
  • Helmet.js security headers
  • CORS protection
  • File type validation
  • Non-root Docker user

Browser Support

  • Chrome/Edge 90+
  • Firefox 88+
  • Safari 14+

Documentation

See DEPLOYMENT_READINESS_REPORT.md for detailed production readiness assessment.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors