Tap a key, speak, get text. A fully local speech-to-text desktop app.
voxtap is a keyboard-driven dictation tool that runs faster-whisper entirely on your machine. No cloud API, no telemetry, no audio ever leaves your device. It ships with a lightweight rich-text editor, optional LLM polishing via Ollama, and a toggle hotkey that works on Linux (X11 & Wayland), macOS, and Windows.
- 100% local — all transcription and polishing run on your hardware
- Tap-to-toggle hotkey — single keybinding starts/stops recording across apps (
voxtap-toggle) - Rich text editor — bold, italic, underline, headings, lists, alignment; copies as Markdown
- Live transcription — audio is buffered and decoded every 1.5 s; text streams in as you speak
- Optional LLM polish — cleans filler words and punctuation via a local Ollama model
- Screenshot paste —
Ctrl+Vof an image inserts the file path directly - Spotify auto-pause — pauses playback while recording, resumes on stop (Linux)
- MCP control surface — the app is scriptable by Claude Code / any MCP client for E2E testing
- Cross-platform — Linux (X11 & Wayland), macOS, Windows
graph LR
User[User] -->|hotkey / click| Toggle[voxtap-toggle]
Toggle -->|IPC| App[voxtap App]
Mic[(Microphone)] --> App
App -->|1.5s audio chunks| Whisper[faster-whisper]
Whisper --> Editor[Qt Rich-Text Editor]
Editor -.optional.-> Ollama[Ollama LLM polish]
Ollama --> Editor
Editor -->|on close / on demand| Clipboard[(System Clipboard<br/>Markdown)]
MCP[MCP Client / Claude Code] -.test harness.-> Control[Control Server<br/>127.0.0.1:29998]
Control --> App
Core components:
| Module | Purpose |
|---|---|
src/voxtap/app.py |
Qt app, audio capture, Whisper inference, rich-text editor |
src/voxtap/toggle.py |
voxtap-toggle entry point — launches or toggles a running instance |
src/voxtap/control_server.py |
Optional TCP control server for MCP-driven automation |
src/voxtap/clipboard.py |
Cross-platform clipboard (xclip / wl-copy / pbcopy / PowerShell) |
mcp_server/pyqt_mcp.py |
stdio MCP server exposing launch_app, click, get_snapshot, ... |
pip install voxtapvoxtapThe first run downloads the Whisper model (~1.5 GB for distil-large-v3). A progress dialog shows download status; recording starts automatically once the model is loaded.
voxtap needs a working audio input, a clipboard utility, and Qt6.
sudo apt install portaudio19-dev xclip
# Wayland users: sudo apt install wl-clipboardsudo dnf install portaudio-devel xclipsudo pacman -S portaudio xclipbrew install portaudio
# pbcopy ships with macOS — no extra clipboard tool neededNo extra system dependencies — PortAudio is bundled with sounddevice, and clipboard access uses PowerShell.
voxtap # Start with defaults (distil-large-v3, English)
voxtap --model small # Smaller / faster model
voxtap --model large-v3 # Full large model for max accuracy
voxtap --language de # Transcribe German
voxtap --device cpu # Force CPU (skip CUDA auto-detection)Bind voxtap-toggle to a key in your window manager for quick access. If an instance is already running, it toggles recording on/off; otherwise it launches a new one. See docs/keybindings.md for setup instructions for i3, Sway, Hyprland, GNOME, KDE, Windows, and macOS.
| Flag | Default | Description |
|---|---|---|
--model |
distil-large-v3 |
Whisper model (tiny, small, medium, large-v3, distil-large-v3, ...) |
--language |
en |
Language code (en, de, fr, es, ...) |
--device |
auto | cpu, cuda, or auto (tries CUDA first) |
| Variable | Description |
|---|---|
VOXTAP_CONTROL_PORT |
If set, starts the TCP control server on this port (loopback only). Used by the MCP server — do not set in production. |
VOXTAP_CONTROL_HOST |
Bind address for the control server. Defaults to 127.0.0.1. |
- Bold (Ctrl+B), Italic (Ctrl+I), Underline (Ctrl+U), Strikethrough (Ctrl+Shift+S)
- Headings (H1, H2, H3)
- Bullet and numbered lists
- Text alignment (left, center, right)
- Paste image paths — Ctrl+V with a screenshot inserts the file path
- Copy as Markdown — button or automatic on close
- Full undo/redo
- voxtap opens a Qt window and starts recording from your microphone
- Audio is buffered and transcribed every 1.5 seconds using faster-whisper
- Transcribed text is appended to the editor (or replaces the selected text)
- You can pause recording, edit text freely, then resume
- If Ollama is running locally, transcribed text is polished (filler words removed, punctuation fixed)
- On close (Escape), the editor content is copied to the clipboard as Markdown
- Spotify is automatically paused during recording and resumed on stop (Linux)
voxtap can use a local LLM via Ollama to clean up transcriptions — removing filler words, fixing punctuation, correcting repeated words. Entirely optional; transcription works fine without it.
-
Install Ollama: https://ollama.com/download
-
Pull a model:
ollama pull gpt-oss:20b
-
Make sure Ollama is running (
ollama serveor the desktop app), then start voxtap as usual. The status bar shows the active LLM model.
If Ollama is not running or the model is not available, voxtap silently skips the polish step.
voxtap embeds an optional TCP control server that exposes every UI action (click, fill, snapshot, screenshot, toggle recording, set transcript, read state) as JSON commands. A companion MCP server (mcp_server/pyqt_mcp.py) wraps this for use with Claude Code or any MCP client — letting an agent drive the app end-to-end without real audio.
pip install voxtap[mcp]See docs/testability_via_mcp.md for tool reference, naming conventions, and example flows.
voxtap/
├── src/voxtap/
│ ├── app.py # Qt app, audio pipeline, Whisper, editor
│ ├── toggle.py # voxtap-toggle entry point (IPC)
│ ├── control_server.py # Optional TCP control server (MCP)
│ └── clipboard.py # Cross-platform clipboard
├── mcp_server/
│ └── pyqt_mcp.py # stdio MCP server wrapping the control server
├── docs/
│ ├── keybindings.md
│ └── testability_via_mcp.md
├── assets/
│ └── logo.png
└── pyproject.toml
| Symptom | Fix |
|---|---|
PortAudio library not found |
Install portaudio19-dev (Debian), portaudio-devel (Fedora), portaudio (brew / pacman) |
wl-copy / xclip not found |
Install a clipboard utility matching your display server (see System Dependencies) |
| CUDA not detected | Install PyTorch with CUDA support, or run with --device cpu |
| Slow on CPU | Use a smaller model: voxtap --model small |
| LLM polish not working | Ensure Ollama is running and the model is pulled — this feature is optional |
Contributions are welcome! Please open an issue to discuss substantial changes before sending a PR.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT © Lukas Kellerstein

