Skip to content

Orellius/orellius-stt

Repository files navigation

orellius-stt

Hold a key, speak Hebrew, release - clean English lands wherever your cursor is. No UI interaction. No clipboard round-trip. No window switching.

Built for native Hebrew speakers who think in Hebrew and need to type in English all day. The OS voice-typing tools don't handle Hebrew well, and flipping keyboard layouts kills flow. This fixes that.

Important

macOS only, Apple Silicon (arm64) required. The HID injection layer uses CGEventTap which is macOS-specific; the Rust native module compiles for arm64 only.


How it works

hotkey down  →  audio capture (16 kHz mono, Web Audio API)
hotkey up    →  whisper-rs transcribes Hebrew locally (ivrit-ai model)
             →  Ollama translates Hebrew → English (local, no cloud)
             →  filler words stripped (אממ, כאילו, etc.)
             →  dictionary rewrites applied
             →  text injected via CGEvent HID tap into focused app

A 340×56 floating pill shows state - idle / recording / transcribing / translating / result / error - with a real-time equalizer during recording. Tray icon color mirrors the state.

Works in every app, including ones that block synthetic input, because injection goes through the HID layer.


Stack

Layer Technology
Shell Electron 41, TypeScript
UI React 19 + Vite
Hotkey napi-rs (Rust), CGEventTap at session level
Paste napi-rs + CGEvent.set_string() + post(HID)
STT whisper-rs + ivrit-ai/whisper-large-v3-turbo-ggml
Translation Ollama - local LLM, no cloud
Packaging electron-builder, arm64 DMG

Requirements

  • macOS 11+, Apple Silicon (arm64)
  • Ollama running on localhost:11434
  • A translation model pulled - default: gemma4:26b-a4b-it-q8_0 (configurable in settings)

Note

On first run, the Whisper model (~3 GB) is downloaded automatically via huggingface_hub. Set WHISPER_MODEL_PATH to point to a local copy and skip the download.

Permissions needed on first run:

  • Microphone
  • Accessibility (for the hotkey tap and HID paste)

Quick Start

# Install dependencies
bun install

# Build the Rust native module (first time only - takes ~1 min)
bun run build:native

# Start in dev mode (hot reload)
bun run dev

Build a release DMG

bun run build    # native + renderer + main
bun run dist     # → release/orellius-stt-0.1.0-arm64.dmg

Typecheck

bun run typecheck

Configuration

All config lives in ~/.config/orellius/:

File Contents
settings.json hotkey key, autoPaste toggle, languageMode (he-en / en-he)
history.json past transcriptions + translations
dictionary.json custom term rewrites (managed from Settings UI)

Tip

To use a lighter model for faster translation at the cost of quality, set OLLAMA_MODEL=qwen2.5:3b before launching. Latency drops to ~0.5s on a 3B model vs ~2–3s on 26B.

Changing the Whisper model

The default model path follows the standard huggingface_hub cache layout:

~/.cache/huggingface/hub/models--ivrit-ai--whisper-large-v3-turbo-ggml/

To use a different model, update the path in Settings or set WHISPER_MODEL_PATH before launching.


Supported modes

Mode Direction
he-en (default) Hebrew speech → English text
en-he English speech → Hebrew text

Toggle in the Settings page inside the app.


Architecture

orellius-stt/
├── electron/
│   ├── main.ts                   Tray, windows, IPC, native bindings
│   └── services/
│       ├── ollama.ts             Translation HTTP + filler stripping + dictionary
│       ├── settings.ts           Settings loader/saver
│       ├── history.ts            Transcription history persistence
│       ├── dictionary.ts         Custom rewrite rules
│       └── logs.ts               In-app log capture
├── native-core/                  Rust napi-rs module
│   └── src/
│       ├── lib.rs                whisper-rs wrapper
│       ├── hotkey.rs             CGEventTap (right-cmd / ctrl / F5 / F6)
│       └── paste.rs              HID injection via CGEvent
└── renderer/
    └── src/
        ├── App.tsx               State machine (idle→recording→…→result)
        ├── audio.ts              16 kHz mono resample
        └── pages/                Settings, History, Dictionary, Overview

Performance

Tested on Mac Studio M4 Max 64GB:

Step Latency
End-to-end (short phrase) ~2–5 s
Model cold start ~1–2 s (cached after first use)
Translation (3B–7B model) ~0.5–1 s
Translation (26B model) ~2–3 s

Contributing

Issues and PRs welcome. The Rust native module (native-core/) is the most interesting part - CGEventTap + HID injection is not well-documented and any improvements there are valuable.

License

MIT

About

Hold a key, speak Hebrew, release — clean English lands in any focused app. Local whisper-rs + Ollama, no cloud. macOS arm64.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors