VoiceLayer

VoiceLayer is a local-first voice composition layer for Ubuntu desktop workflows. It combines fast dictation, structured text composition, rewrite, and translation into a single daemon, CLI/TUI, and host-injection stack.

Scope

VoiceLayer is designed for:

Browser text areas and document editors
IDE input surfaces and comment fields
Terminal and TUI applications such as tmux, Neovim, Claude Code, and Codex CLI
Drafting workflows that need preview and confirmation before insertion

VoiceLayer is not designed as:

A traditional IME candidate window
A subtitle-only transcriber
A browser-only extension
A cloud-only voice assistant

Architecture

crates/voicelayer-core: shared domain types and injection planning
crates/voicelayer-doc-test-utils: dev-only helpers shared by the workspace's repository-wide markdown guard tests
crates/voicelayerd: Unix-socket daemon and /v1 control API
crates/vl: CLI/TUI entry point and operator tooling
crates/vl-desktop: interactive GUI shell that talks to the daemon over the same socket
python/voicelayer_orchestrator: JSON-RPC worker protocol and provider orchestration entry point
systemd/: user-service templates for the daemon and the optional persistent whisper-server
scripts/install.sh: one-shot installer that builds release binaries and seeds ~/.local/bin/, ~/.config/systemd/user/, and ~/.config/voicelayer/
docs/: architecture, host strategy, and operations documentation
openapi/: local API contract

Current Status

Shipped today:

Rust workspace with voicelayer-core, voicelayerd, vl, and vl-desktop
/v1 control API over a Unix domain socket with Server-Sent Events at /v1/events/stream
One-shot and fixed-duration segmented live dictation (POST /v1/sessions/dictation with segmentation.mode = one_shot | fixed), with per-segment segment_recorded / segment_transcribed events and a concatenated transcript on stop
vl dictation foreground-ptt alternate-screen panel with hold-to-record, transcript scrolling, clipboard restore, and tmux / WezTerm / Kitty targets
vl-desktop GUI overlay that shares the same socket, session state, and event stream as the CLI
Real ASR via whisper.cpp: one-shot whisper-cli plus an optional persistent whisper-server endpoint (with autostart) for warm-model reuse
Optional silero-vad pre-pass inside the Python worker that trims non-speech before whisper
Optional Xiaomi MiMo-V2.5-ASR backend (CUDA-only, opt-in via provider_id) for multilingual and quality-priority transcription, selectable per-request on /v1/transcriptions and per-session on the dictation pipeline (vl dictation start --provider-id mimo_v2_5_asr, vl record-transcribe --provider-id ..., vl dictation foreground-ptt --provider-id ...); see docs/guides/local-asr-provider.md
Real LLM integration via OpenAI-compatible chat completions, with optional llama-server autostart for local endpoints
Live Rust↔Python stdio JSON-RPC bridge through the uv-managed project environment
systemd user units for voicelayerd and the optional whisper-server, plus scripts/install.sh
vl doctor surfaces recorder diagnostics, whisper mode (cli / server / unconfigured), LLM reachability, portal support, and systemd unit state

Not yet implemented (documented and scoped):

GNOME portal hotkey binding beyond availability probing
AT-SPI writable target discovery
Always-on background microphone and mid-utterance partial transcripts
VAD-driven segmentation boundaries at the recorder layer (fixed-duration segmentation is shipped; adaptive VAD-driven segmentation is a later stage)
.deb packaging

Development

Requirements

Rust 1.88+
Python 3.12+
uv 0.11+
Ubuntu with PipeWire

Verification Chain

The authoritative commands every change must pass before merge (also mirrored in CLAUDE.md):

cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all
uv sync --group dev
uv run ruff check python tests/python
uv run ruff format --check python tests/python
uv run pytest -q tests/python

Python commands in this repository should always run through uv.

Worker Runtime

The daemon and CLI launch the Python worker from the project-managed environment. Resolution order is:

VOICELAYER_PROJECT_ROOT/.venv/bin/python -m voicelayer_orchestrator.worker
uv run --project <project_root> python -m voicelayer_orchestrator.worker

If you run the daemon outside the repository root, set VOICELAYER_PROJECT_ROOT explicitly. When a local LLM endpoint is configured, vl doctor also probes endpoint reachability through /v1/models. If VOICELAYER_LLM_AUTO_START=true, the worker can also auto-launch llama-server for local endpoints.

Run the Daemon

cargo run -p vl -- daemon run --project-root "$(pwd)"

By default the daemon listens on:

$XDG_RUNTIME_DIR/voicelayer/daemon.sock

Inspect the Environment

cargo run -p vl -- doctor

Inspect Providers

cargo run -p vl -- providers

Configure a Local LLM Endpoint

See docs/guides/local-llm-provider.md for the llama.cpp server path and environment variables.

Configure a Local ASR Provider

See docs/guides/local-asr-provider.md for the whisper.cpp file transcription path, the optional persistent whisper-server deployment, and the optional silero-vad pre-pass.

Launch the Desktop Shell

See docs/guides/desktop.md for vl-desktop usage and the two client-side environment variables (VOICELAYER_VL_BIN, VOICELAYER_LOG).

Install as a systemd User Service

See docs/guides/systemd.md for scripts/install.sh, the voicelayerd unit, and the optional dedicated voicelayer-whisper-server unit (with a Docker drop-in).

Render a Bracketed Paste Payload

cargo run -p vl -- print-bracketed-paste "Analyze the current repository authentication flow."

Transcribe a Local Audio File

cargo run -p vl -- transcribe-file /path/to/sample.wav --language auto

Record and Transcribe a Short Clip

cargo run -p vl -- record-transcribe --duration-seconds 8 --language auto

The CLI prefers pw-record with timeout --signal=INT and falls back to arecord. Internally this reuses the same daemon-side dictation capture flow the UI and hotkey layer call.

The daemon exposes a live dictation session flow:

POST /v1/sessions/dictation starts recording
POST /v1/sessions/dictation/stop stops recording and returns the transcript

The request body's segmentation field selects between one-shot and fixed-duration segmented capture:

{"mode": "one_shot"} (default) records a single WAV from start to stop and transcribes it once.
{"mode": "fixed", "segment_secs": N} rolls the recorder every N seconds; each finalized chunk is transcribed in the background and the per-segment events surface on /v1/events/stream (dictation.segment_recorded, dictation.segment_transcribed) while stop returns the concatenated transcript.

The vl CLI exercises that control plane directly:

cargo run -p vl -- dictation start --backend pipewire --language auto
cargo run -p vl -- dictation stop <session-id>

foreground-ptt uses an alternate-screen status panel instead of streaming JSON on each transition. The panel shows:

current dictation status
active session ID
last completed session ID
last transcript preview
last injection result
last error
recent events

Panel controls:

j / k or Up / Down to scroll the full transcript view
PageUp / PageDown for larger transcript jumps
c to copy the last completed transcript to the system clipboard on demand
r to restore the saved text clipboard backup after the tool has overwritten the clipboard
i to re-apply the last injection target
s to save the last transcript to a timestamped text file
d to discard the last transcript from the panel
Esc to exit

If you also want a clipboard fallback after each completed dictation:

cargo run -p vl -- dictation foreground-ptt --backend pipewire --language auto --copy-on-stop

This writes the finished transcript to the system clipboard before any optional terminal-target injection.

You can change the default stop behavior without leaving the panel:

cargo run -p vl -- dictation foreground-ptt \
  --default-stop-action inject \
  --restore-clipboard-on-exit \
  --save-dir ~/Documents/voice-layer

Available default stop actions are:

none
copy
inject
save

VoiceLayer can also persist these defaults in a local config file:

cargo run -p vl -- config path
cargo run -p vl -- config init-defaults
cargo run -p vl -- config show
cargo run -p vl -- config set foreground_ptt.default_stop_action inject

The config file lives at:

~/.config/voicelayer/config.toml

For terminal-focused fallback usage, vl also provides a foreground raw-terminal mode:

cargo run -p vl -- dictation foreground-ptt --backend pipewire --language auto

When the terminal reports key release events, this behaves like hold-to-record. When release events are not available, it degrades to:

first key press starts dictation
second key press stops dictation
Esc exits the mode

If you run the controller inside tmux and want the transcript pasted into another pane:

cargo run -p vl -- dictation foreground-ptt --backend pipewire --language auto --tmux-target-pane %2

This uses tmux set-buffer plus tmux paste-buffer -dpr -t <pane>. The controller refuses to paste into the same pane that is currently running foreground-ptt.

If you omit --tmux-target-pane while running inside tmux:

zero candidate panes: no tmux injection is attempted
one candidate pane: it is selected automatically
multiple candidate panes: vl prompts you to choose a target pane before entering raw mode

For terminal-specific explicit targets outside tmux:

cargo run -p vl -- dictation foreground-ptt --wezterm-target-pane-id 12
cargo run -p vl -- dictation foreground-ptt --kitty-match 'title:Output'

These routes are explicit-only:

WezTerm uses wezterm cli send-text --pane-id
Kitty uses kitten @ send-text --match ... --stdin --bracketed-paste auto

VoiceLayer does not auto-discover WezTerm or Kitty targets yet.

Inspect Global Shortcuts Portal Support

cargo run -p vl -- hotkeys portal-status

This checks whether the current desktop session exposes org.freedesktop.portal.GlobalShortcuts.

Product Defaults

Desktop target: Ubuntu GNOME Wayland
Local ASR baseline: whisper.cpp
Local LLM baseline: Gemma 4 via llama.cpp-compatible deployment
GUI insertion priority: AT-SPI, then clipboard, then keyboard simulation fallback
Terminal insertion priority: bracketed paste, then terminal-specific adapters
Preview surface: CLI/TUI first, GUI preview later

License

The repository is intended to ship under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github/workflows		.github/workflows
crates		crates
docs		docs
openapi		openapi
python/voicelayer_orchestrator		python/voicelayer_orchestrator
scripts		scripts
systemd		systemd
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceLayer

Scope

Architecture

Current Status

Development

Requirements

Verification Chain

Worker Runtime

Run the Daemon

Inspect the Environment

Inspect Providers

Configure a Local LLM Endpoint

Configure a Local ASR Provider

Launch the Desktop Shell

Install as a systemd User Service

Render a Bracketed Paste Payload

Transcribe a Local Audio File

Record and Transcribe a Short Clip

Inspect Global Shortcuts Portal Support

Product Defaults

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoiceLayer

Scope

Architecture

Current Status

Development

Requirements

Verification Chain

Worker Runtime

Run the Daemon

Inspect the Environment

Inspect Providers

Configure a Local LLM Endpoint

Configure a Local ASR Provider

Launch the Desktop Shell

Install as a systemd User Service

Render a Bracketed Paste Payload

Transcribe a Local Audio File

Record and Transcribe a Short Clip

Inspect Global Shortcuts Portal Support

Product Defaults

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages