Minimal setup for speech-to-text transcription and text-to-speech generation.
- Python 3.9 (pynput requires ≤3.9, other libraries require ≥3.8)
- ffmpeg (see installation instructions below)
- A Python package manager like uv.
-
Install command-line tool ffmpeg, required for speech-to-text transcription, on your system, which is available from most package managers:
# on Ubuntu or Debian sudo apt update && sudo apt install ffmpeg # on Arch Linux sudo pacman -S ffmpeg # on MacOS using Homebrew (https://brew.sh/) brew install ffmpeg # on Windows using Chocolatey (https://chocolatey.org/) choco install ffmpeg # on Windows using Scoop (https://scoop.sh/) scoop install ffmpeg -
Install Python dependencies:
uv sync
-
Run the application:
uv run python main.py
-
Controls:
- Press Space to start/stop recording
- Press ESC to exit the application
- Python - Core language
- OpenAI Whisper API - Speech-to-text transcription
- PyAudio - Audio recording from microphone
- pynput - Keyboard event handling
- bark - Text-to-speech generation
- uv - Python package management