A real-time voice-enabled chatbot that uses speech recognition, Google's Gemini Pro AI, and text-to-speech synthesis for natural conversations.
- 🎤 Real-time speech recognition
- 🤖 AI-powered responses using Google's Gemini Pro
- 🔊 Natural text-to-speech output
- 💬 Interactive chat interface
- ⚡ Real-time response handling
- HTML5
- CSS3
- JavaScript (ES6+)
- Web Speech API
- SpeechRecognition
- SpeechSynthesis
- Python 3.x
- Flask
- Google Generative AI (Gemini Pro)
- Flask-CORS
- Python 3.x
- Node.js and npm
- Google AI API key
- Modern web browser (Chrome recommended)
-
Clone the repository
git clone <repository-url> cd voice-chatbot
-
Backend Setup
# Create and activate virtual environment (optional but recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt
-
Environment Configuration
# Create .env file in the root directory echo "GOOGLE_AI_API_KEY=your_api_key_here" > .env
-
Frontend Setup
# Install Vite (if not already installed) npm install
-
Start the Backend Server
python app.py
The server will start on http://localhost:5000
-
Start the Frontend Development Server
npm run dev
The application will be available at http://localhost:5173
SpeechBot/
├── frontend/
│ ├── index.html # Main HTML file
│ ├── style.css # Styles
│ └── main.js # Frontend logic
├── backend/
│ ├── app.py # Flask server
│ └── requirements.txt # Python dependencies
├── .env # Environment variables
└── README.md # Project documentation
- Uses Web Speech API's SpeechRecognition
- Continuous listening mode
- Error handling for unsupported browsers
- Integrates with Google's Gemini Pro
- Processes natural language input
- Generates contextual responses
- Natural voice synthesis
- Configurable speech parameters
- Queue management for responses
- Click the "Start Listening" button
- Speak your message
- Wait for the AI response
- The response will be displayed and spoken back
The application handles various error scenarios:
- Speech recognition errors
- Network connectivity issues
- API errors
- Browser compatibility issues
- API key protection using environment variables
- CORS configuration for secure communication
- Input validation on both frontend and backend
- No persistent data storage
Recommended browsers:
- Google Chrome (preferred)
- Microsoft Edge
- Firefox
- Safari (limited support)
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google AI for Gemini Pro API
- Web Speech API contributors
- Flask and its community
-
Speech Recognition Not Working
- Ensure you're using a supported browser
- Check microphone permissions
- Verify stable internet connection
-
API Key Issues
- Verify .env file configuration
- Check API key validity
- Ensure proper environment variable loading
-
Backend Connection Failed
- Confirm backend server is running
- Check CORS configuration
- Verify correct port settings
For issues and feature requests, please create an issue in the repository.