Skip to content

pabvald/echo-voice

Repository files navigation

Echo-voice

Minimal setup for speech-to-text transcription and text-to-speech generation.

Requirements

  • Python 3.9 (pynput requires ≤3.9, other libraries require ≥3.8)
  • ffmpeg (see installation instructions below)
  • A Python package manager like uv.

Instructions

  1. Install command-line tool ffmpeg, required for speech-to-text transcription, on your system, which is available from most package managers:

    # on Ubuntu or Debian
    sudo apt update && sudo apt install ffmpeg
    
    # on Arch Linux
    sudo pacman -S ffmpeg
    
    # on MacOS using Homebrew (https://brew.sh/)
    brew install ffmpeg
    
    # on Windows using Chocolatey (https://chocolatey.org/)
    choco install ffmpeg
    
    # on Windows using Scoop (https://scoop.sh/)
    scoop install ffmpeg
    
  2. Install Python dependencies:

    uv sync
  3. Run the application:

    uv run python main.py
  4. Controls:

    • Press Space to start/stop recording
    • Press ESC to exit the application

Tech Stack

  • Python - Core language
  • OpenAI Whisper API - Speech-to-text transcription
  • PyAudio - Audio recording from microphone
  • pynput - Keyboard event handling
  • bark - Text-to-speech generation
  • uv - Python package management

About

Minimal setup to transcribe natural speech in Python

Resources

Stars

Watchers

Forks

Contributors

Languages