Push-to-talk speech-to-text client for the Ancroo Stack. Hold a hotkey, speak, release - text appears at your cursor.
Transcription is managed centrally by the Ancroo Backend — the client just sends audio. Lightweight binary for Linux and Windows, no local GPU required.
Phase 0 (Beta) — Ancroo Voice is functional for local use, but the backend it connects to runs without encryption or authentication by default and is still under active development. Intended for local/trusted networks only. See the Ancroo Roadmap for the security path forward.
%%{init: {'theme': 'neutral'}}%%
graph LR
mic["Microphone"]
voice["Ancroo Voice"]
backend["Ancroo<br/>Backend"]
stt["Speech to<br/>Text"]
mic --> voice
voice <--> backend
backend <--> stt
style mic fill:transparent,stroke:transparent,color:#1e3a5f
style voice fill:#fef08a,stroke:#eab308,color:#713f12
style backend fill:#d1fae5,stroke:#10b981,color:#064e3b
style stt fill:#fed7aa,stroke:#f97316,color:#7c2d12
- Push-to-Talk: Hold hotkey to record, release to transcribe
- Backend-Managed STT: Ancroo Backend handles model and server selection centrally
- Lightweight Binary: Small download, no GPU dependencies
- Linux + Windows: Pre-built binaries for both platforms
- Configurable Hotkeys: Any key combination (Ctrl+Space, Alt+R, etc.) with visual hotkey recorder
- Multi-Language: 10 languages + Auto-Detection
- Dark/Light Mode: Switch between dark and light themes
- UI Scaling: Adjustable font size (A-/A+ buttons)
- Auto-Copy: Optionally copy transcriptions to clipboard automatically
- GUI Record Button: On-screen record button as alternative to hotkeys (required on Wayland)
Tip: Pair Ancroo Voice with a Stream Deck or foot pedals for one-button dictation and workflow triggers. You can see an example setup in this Article: Supercharge Your AI Workflow: Speech-to-Text with Stream Deck
Download the latest release for your platform:
| Platform | Download |
|---|---|
| Windows | AncrooVoice-Windows.zip |
| Linux | AncrooVoice-Linux.tar.gz |
Windows:
1. Extract ZIP
2. Run AncrooVoice.exe
Note: Windows SmartScreen may show an "Unknown publisher" warning — this is normal for unsigned open-source software. To proceed:
- Click "More info" → "Run anyway", or
- Right-click the
.exe→ Properties → check "Unblock" → Apply (removes the warning permanently)
Linux:
tar -xzf AncrooVoice-Linux.tar.gz
./AncrooVoice-Linux.shWayland: Global hotkeys are not supported on Wayland due to security restrictions. Use the on-screen record button instead.
Edit the config file to point to your Ancroo Backend (.env on Linux, ancroo-voice.ini on Windows):
ANCROO_BACKEND_ENDPOINT=http://your-server:8900/api/v1/transcribeImportant: The Ancroo Backend must have at least one active STT provider configured. Use the Admin UI at
http://your-server:8900/admin/stt-providersto register your STT server (e.g. Whisper-ROCm, Speaches).
- Select microphone
- Click "Start"
- Hold your hotkey (default: Ctrl+Space) and speak
- Release - text appears at cursor
Ancroo Voice connects to the Ancroo Backend, which manages STT model and server selection centrally. The client sends audio and receives transcribed text — no local configuration of STT models needed.
| Variable | Required | Description |
|---|---|---|
ANCROO_BACKEND_ENDPOINT |
Yes | Ancroo Backend transcribe URL, e.g. http://your-server:8900/api/v1/transcribe |
ANCROO_BACKEND_API_KEY |
No | Bearer token for authenticated backends |
ANCROO_BACKEND_VERIFY_SSL |
No | Set to false for self-signed certificates |
Note: The endpoint points to the Ancroo Backend (default port 8900), not directly to a Whisper/STT server. The backend handles model and server routing internally.
| File | Purpose | Notes |
|---|---|---|
.env / ancroo-voice.ini |
Backend connection | .env on Linux, .ini on Windows |
ancroo-voice_config.json |
GUI settings | Auto-saved by the application |
This project is built with the following open-source software:
| Project | Purpose | License |
|---|---|---|
| CustomTkinter | GUI framework | MIT |
| pynput | Global hotkey listener | LGPL-3.0 |
| sounddevice | Audio recording | MIT |
| NumPy | Audio processing | BSD-3-Clause |
| Pillow | Image handling | HPND |
| Requests | HTTP client | Apache-2.0 |
Speech-to-text is provided by OpenAI Whisper (MIT) models running on your server via the Ancroo Stack.
Contributions are welcome! Feel free to open an issue or submit a pull request.
To report a security vulnerability, please use GitHub's private vulnerability reporting instead of opening a public issue.
MIT — see LICENSE. The Ancroo name is not covered by this license and remains the property of the author.
Stefan Schmidbauer — GitHub · stefan@ancroo.com
Built with the help of AI (Claude by Anthropic).
