A signal-controlled audio transcription daemon in Python powered by locally-running OpenAI Whisper or NVIDIA Parakeet. Record audio on demand with system signals and get transcribed text instantly copied to your clipboard.
Note: This tool is designed for Linux systems only.
This project is a Python rewrite of the original whispy by @daaku. The name "whispypy" comes from the original "whispy" + "py" for Python. Special thanks to daaku for the original implementation and inspiration.
Beeps are from LaSonotheque of Joseph Sardin.
Whispypy sounds like "ouistiti" (French for marmoset)
- ποΈ Signal-controlled recording (start/stop with SIGUSR2)
- π― Audio device discovery and testing
- π€ Multiple transcription engines:
- OpenAI Whisper (default): Multiple model sizes (tiny to large-v3)
- NVIDIA Parakeet: High-performance ASR model (nvidia/parakeet-tdt-0.6b-v3)
- NVIDIA Parakeet INT8 (Sherpa-ONNX): CPU-friendly ONNX engine (auto-downloads a prebuilt model bundle on first run)
- π Automatic clipboard integration (Wayland/X11)
- οΏ½ Auto-paste functionality (automatically paste transcribed text)
- οΏ½π§ Configurable audio input devices with persistent configuration
- π Optional audio file retention
-
Python 3.13+
-
Transcription engines:
- OpenAI Whisper (default, always available)
- NVIDIA Parakeet (optional): Requires
nemo_toolkit[asr] - NVIDIA Parakeet INT8 (optional): Requires
sherpa-onnx
-
Audio system:
- PipeWire (preferred):
pw-record,pw-cli - ALSA (fallback):
arecord
- PipeWire (preferred):
-
Clipboard tools:
- Wayland:
wl-copy - X11:
xcliporxsel
- Wayland:
-
Auto-paste tools (optional, for
--autopastefeature):- Wayland:
wtype,ydotoolordotool(especially for non QWERTY layouts) - X11:
xdotool
- Wayland:
-
Auto-download tools (optional, for Parakeet INT8 bundle download):
curl(preferred) orwget, plustar
# Clone the repository
git clone git@github.com:rangzen/whispypy.git
cd whispypy
# Install dependencies using UV (recommended)
uv sync
# Install UV first if you don't have it installed
curl -LsSf https://astral.sh/uv/install.sh | sh
uv syncTo use NVIDIA Parakeet for transcription, install the NeMo toolkit:
# Install PyTorch and related packages for CPU-only systems (or with CUDA if you have a GPU)
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install NeMo toolkit with ASR support (required for Parakeet)
uv pip install nemo_toolkit[asr]
# Install specific ONNX version to avoid compatibility issues
uv pip install onnx==1.18.0Note: NeMo installation is large (~2GB) and may take some time. Whisper works out of the box without additional dependencies. See onnx/onnx#7249 for ONNX installation issues.
This engine runs Parakeet using Sherpa-ONNX and an INT8 ONNX model bundle.
Install:
# Install optional extra
uv sync --extra parakeet-onnx
# Alternative
uv pip install ".[parakeet-onnx]"Verify (recommended):
This will load the model (and auto-download it on first run) and then exit.
uv run python whispypy-daemon.py --engine parakeet_onnx_int8 --check-modelFirst-run download behavior:
-
If you do not pass
--parakeet-onnx-dir,whispypydownloads the bundle from thek2-fsa/sherpa-onnxGitHub Releases (tagasr-models) usingcurl(orwget) and extracts it withtar. -
Default bundle id:
sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8 -
Cache location:
${XDG_CACHE_HOME:-~/.cache}/whispypy/models/
--parakeet-onnx-diris optional and intended as an override to use a pre-downloaded bundle.
Selecting a different bundle:
# Use a specific Sherpa-ONNX bundle id as the positional argument
uv run python whispypy-daemon.py --engine parakeet_onnx_int8 sherpa-onnx-... -d "your_device_name"
# Or via flag (when not using positional)
uv run python whispypy-daemon.py --engine parakeet_onnx_int8 --parakeet-onnx-model-id sherpa-onnx-...CUDA (optional):
If your sherpa-onnx installation includes CUDA support, you can request it with:
uv run python whispypy-daemon.py --engine parakeet_onnx_int8 --onnx-provider cudaIf CUDA isn't available, whispypy will fall back to CPU.
-
Discover audio devices:
uv run python test_audio_devices.py
-
Start the daemon:
# Default (Whisper engine) - first time saves device to config file uv run python whispypy-daemon.py -d "your_device_name" # Subsequent runs - automatically uses saved device uv run python whispypy-daemon.py # Use Parakeet engine (requires NeMo installation) uv run python whispypy-daemon.py --engine parakeet nvidia/parakeet-tdt-0.6b-v3 -d "your_device_name" # Use Parakeet INT8 via Sherpa-ONNX (auto-download model bundle on first run) uv run python whispypy-daemon.py --engine parakeet_onnx_int8 -d "your_device_name"
-
Control recording:
# Start/stop recording (manual) kill -USR2 <daemon_pid> # Or use the convenience script (automatic PID detection) ./send_signal.sh # Stop daemon kill -SIGINT <daemon_pid>
-
Add a shortcut key:
Use your desktop environment's keyboard settings to bind a key combination to run
./send_signal.shorpkill -SIGUSR2 -f whispypy-daemon.py.
On Ubuntu, you can create a custom shortcut in Settings > Keyboard > Keyboard Shortcuts > View and Customize Shortcuts > Custom Shortcuts. Click the "+" button, name it "Whispypy Toggle Recording", and set the command to the full path of send_signal.sh or the pkill command.
E.g. for me, sh -c -- "~/sources/whispypy/send_signal.sh".
Then assign your desired key combination, e.g., Ctrl+Shift+t (t like talk).
Run the audio device test script to discover and test available audio devices:
uv run python test_audio_devices.pyThis will:
- Automatically discover all available audio input devices
- Let you test specific devices or all devices
- Show which devices are working and their signal strength
- Provide the exact device name to copy
Example output:
π€ Found 2 audio input device(s):
1. Raptor Lake-P/U/H cAVS Headphones Stereo Microphone
Device: alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp__source
2. Raptor Lake-P/U/H cAVS Digital Microphone
Device: alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source
=== Test Results Summary ===
π Found 1 working device(s):
1. β
Raptor Lake-P/U/H cAVS Digital Microphone
π Device: alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source
π Signal strength (RMS): 0.003627
π§ Copy this device name for your whisper daemon:
alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source
Copy the working device name from step 1 and use it with the daemon:
# First time - specify device and save to config (default Whisper engine)
uv run python whispypy-daemon.py --device "alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source"
# Subsequent runs - device loaded automatically from config
uv run python whispypy-daemon.py
# Or with short flag for first time
uv run python whispypy-daemon.py -d "alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source"
# Use Parakeet engine (requires NeMo installation)
uv run python whispypy-daemon.py --engine parakeet nvidia/parakeet-tdt-0.6b-v3 -d "your_device_name"
# Use Parakeet INT8 via Sherpa-ONNX (auto-download model bundle on first run)
uv run python whispypy-daemon.py --engine parakeet_onnx_int8 -d "your_device_name"
# Select a specific Sherpa-ONNX bundle id (positional argument)
uv run python whispypy-daemon.py --engine parakeet_onnx_int8 sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8 -d "your_device_name"
# With specific Whisper model (loads device from config)
uv run python whispypy-daemon.py large-v3
# With additional options (loads device from config)
uv run python whispypy-daemon.py --keep-audio
# Update to different device
uv run python whispypy-daemon.py --device "new_device_name_here"Note: When you specify a device with
--device, it's automatically saved to~/.config/whispypy/config.conf. Future runs without--devicewill use the saved device configuration.
The daemon will show its PID and wait for signals:
Script PID: 12345
To send signal from another terminal: kill -USR2 12345
Using audio device: alsa_input.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_6__source
Using transcription engine: whisper
Ready. Send SIGUSR2 to start/stop recording.
From another terminal, send signals to control recording:
# Start recording
kill -USR2 12345
# Stop recording and transcribe (send same signal again)
kill -USR2 12345
# Or use the convenience script (finds PID automatically)
./send_signal.sh
# Stop the daemon
kill -SIGINT 12345
# or press Ctrl+C in the daemon terminal- No arguments needed - automatically discovers and tests devices
- Interactive menu for device selection
usage: whispypy-daemon.py [-h] [--engine {whisper,parakeet,parakeet_onnx_int8}] [--parakeet-onnx-dir PARAKEET_ONNX_DIR] [--parakeet-onnx-model-id PARAKEET_ONNX_MODEL_ID] [--parakeet-onnx-cache-dir PARAKEET_ONNX_CACHE_DIR] [--onnx-provider {cpu,cuda}] [--onnx-threads ONNX_THREADS] [--check-model] [--device DEVICE] [--keep-audio] [--autopaste] [--verbose] [model_path]
Arguments:
model_path Model path or name. For Whisper: tiny, base, small, medium, large, large-v2, large-v3. For Parakeet: nvidia/parakeet-tdt-0.6b-v3. For parakeet_onnx_int8: optional sherpa-onnx bundle id (omit or use "base" to use default from --parakeet-onnx-model-id)
Options:
--engine, -e {whisper,parakeet,parakeet_onnx_int8} Transcription engine to use (default: whisper)
--parakeet-onnx-dir PARAKEET_ONNX_DIR Directory with encoder/decoder/joiner/tokens (advanced; bypass auto-download)
--parakeet-onnx-model-id PARAKEET_ONNX_MODEL_ID Sherpa-ONNX bundle id to download (default: sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8)
--parakeet-onnx-cache-dir PARAKEET_ONNX_CACHE_DIR Override cache directory for auto-downloaded bundles
--onnx-provider {cpu,cuda} Execution provider for sherpa-onnx (cuda falls back to cpu if unavailable)
--onnx-threads ONNX_THREADS Number of threads for sherpa-onnx (default: auto)
--check-model Load the selected model and exit
--device, -d DEVICE Audio input device name. If not provided, loads from ~/.config/whispypy/config.conf
--keep-audio Keep temporary audio files
--autopaste Automatically paste transcribed text after copying to clipboard
--verbose, -v Enable verbose logging
# 1. Test devices
uv run python test_audio_devices.py
# 2. Copy the working device name and run daemon (saves config)
uv run python whispypy-daemon.py -d "your_working_device_name"
# 3. Next time, just run without device (uses saved config)
uv run python whispypy-daemon.py# Default Whisper engine with base model
uv run python whispypy-daemon.py
# Whisper with larger model
uv run python whispypy-daemon.py large-v3
# Parakeet engine (requires NeMo installation)
uv run python whispypy-daemon.py --engine parakeet nvidia/parakeet-tdt-0.6b-v3
# Parakeet with device specification (first time)
uv run python whispypy-daemon.py -e parakeet nvidia/parakeet-tdt-0.6b-v3 -d "your_device"
# Parakeet INT8 via Sherpa-ONNX (auto-download model bundle on first run)
uv run python whispypy-daemon.py --engine parakeet_onnx_int8
# Parakeet INT8 via Sherpa-ONNX (explicit bundle id as positional argument)
uv run python whispypy-daemon.py --engine parakeet_onnx_int8 sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8
# Parakeet INT8 via Sherpa-ONNX (prefer CUDA; falls back to CPU)
uv run python whispypy-daemon.py --engine parakeet_onnx_int8 --onnx-provider cuda# First setup with larger model (saves device config)
uv run python whispypy-daemon.py large-v3 -d "your_device" --verbose
# Subsequent runs with same config
uv run python whispypy-daemon.py large-v3 --verbose
# Keep audio files for debugging (uses saved device)
uv run python whispypy-daemon.py --keep-audio
# Auto-paste transcribed text directly (copies to clipboard AND pastes automatically)
uv run python whispypy-daemon.py --autopaste
# Combine autopaste with other options
uv run python whispypy-daemon.py large-v3 --autopaste --verbose
# Parakeet with verbose logging
uv run python whispypy-daemon.py -e parakeet nvidia/parakeet-tdt-0.6b-v3 --verboseThe --autopaste flag enables automatic pasting of transcribed text directly into the currently focused application:
- Normal mode: Text is copied to clipboard only
- Auto-paste mode: Text is copied to clipboard AND automatically pasted
Requirements for auto-paste:
-
Wayland: Install
wtype,ydotool, ordotool(especially for non QWERTY layouts)# Debian/Ubuntu sudo apt install wtype ydotool # Arch Linux sudo pacman -S wtype ydotool # dotool (from source) # See: https://git.sr.ht/~geb/dotool
-
X11: Install
xdotool# Debian/Ubuntu sudo apt install xdotool # Arch Linux sudo pacman -S xdotool
Usage example:
- Focus the application where you want the text (text editor, terminal, browser, etc.)
- Start recording with
./send_signal.shthrough your shortcut to not lose focus - Speak your text
- Stop recording with
./send_signal.shthrough your shortcut again... - Text is automatically pasted in the focused application
Note: Auto-paste simulates
Ctrl+Vkeypress. If auto-paste fails, the text is still available in the clipboard for manual pasting.
The daemon automatically saves your audio device configuration to ~/.config/whispypy/config.conf when you specify --device. This allows you to run the daemon without specifying the device every time.
Configuration behavior:
- First run: Use
--device "your_device_name"to save the device - Subsequent runs: Simply run
uv run python whispypy-daemon.py- device loads automatically - Change device: Use
--device "new_device_name"to update the saved configuration
Config file format:
You can check an example configuration file with config.conf.example.
[DEFAULT]
device = your_device_name_here
# Optional: Configure dotool keyboard layout for autopaste (Wayland only)
dotool_xkb_layout = fr
dotool_xkb_variant = bepoSupported configuration options:
device: Audio input device namedotool_xkb_layout: XKB keyboard layout for dotool (used in Wayland autopaste fallback)dotool_xkb_variant: XKB keyboard variant for dotool (used in Wayland autopaste fallback)
Note: The dotool settings are only used as a fallback when primary paste tools (wtype, ydotool) are not available on Wayland systems.
Manual config management:
# View current config
cat ~/.config/whispypy/config.conf
# Remove config (forces device specification on next run)
rm ~/.config/whispypy/config.confIf the daemon fails to record:
- Make sure you used the exact device name from
uv run python test_audio_devices.py - Verify the device is not in use by another application
- Check audio permissions
- Re-run
uv run python test_audio_devices.pyto confirm the device still works - Make sure
send_signal.shis executable:chmod +x send_signal.sh
- Model download fails: Check internet connection; models are downloaded on first use
- Slow transcription: Try smaller models (
tiny,base) for faster processing - Memory issues: Use smaller models or check available RAM
- Model download timeout: Parakeet models are large (~600MB); ensure stable internet connection
- CUDA warnings: Parakeet will use CPU if CUDA isn't available (slower but functional)
- Import warnings: NeMo may show warnings about missing optional dependencies; these are usually harmless
The daemon automatically handles different audio formats:
- Whisper: Uses raw f32 audio data (
.aufiles) - Parakeet: Uses standard audio files (
.wavfiles)
- Whisper: Better for general use, multiple languages, smaller models available
- Parakeet: Optimized for English, potentially faster with GPU acceleration
The transcribed text is automatically copied to your clipboard using the appropriate tool for your display server (wl-copy for Wayland, xclip/xsel for X11).
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
Do as you wish. Have fun.
