LocalTranslate

A macOS menu bar application that captures microphone audio and transcribes it locally using OpenAI's Whisper model, with optional speaker diarization.

Features

Local transcription: Records audio sessions and transcribes them using OpenAI's Whisper model running entirely on-device
Menu bar integration: Runs as a lightweight macOS menu bar app
Speaker diarization: Optional feature to identify and label different speakers (requires HuggingFace token)
Flexible output: Saves transcriptions to Markdown or plain text files with session date and duration
System audio capture: Record and transcribe system audio (video calls, podcasts, YouTube) alongside your microphone using BlackHole
Configurable settings: Choose Whisper model size, output format, microphone, and more

Requirements

macOS
Python 3.10+
Microphone access

Installation

Clone or download this repository:
```
cd /Users/apple/localtranslate
```

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Grant microphone permissions when prompted on first run.

Usage

Starting the App

python main.py

The app will appear in your menu bar.

Menu Bar Controls

Start Recording / Stop Recording: Toggle audio capture and transcription
Open Output Folder: Open the folder containing transcription files
Settings...: Configure the app
Quit: Exit the application

Settings

Access settings through the menu bar to configure:

Setting	Description	Default
Output Folder	Where transcription files are saved	`~/Documents/Transcriptions`
Whisper Model	Model size (tiny/base/small/medium/large)	`base`
Microphone	Input device selection	System default
Include System Audio	Capture system audio (requires BlackHole)	Disabled
System Audio Device	Device for system audio capture	(empty)
File Format	Output format (.md or .txt)	Markdown
Include Timestamps	Add timestamps to transcription lines (planned)	Enabled
Enable Diarization	Identify different speakers	Disabled
HuggingFace Token	Required for speaker diarization	(empty)

Whisper Model Selection

Model	Size	Speed	Accuracy	Recommended For
tiny	~39MB	Fastest	Basic	Quick testing
base	~74MB	Fast	Good	Daily use
small	~244MB	Moderate	Better	Higher accuracy
medium	~769MB	Slow	High	Quality transcription
large	~1.5GB	Slowest	Best	Maximum accuracy

The model will be downloaded automatically on first use.

Speaker Diarization (Optional)

To enable speaker identification:

Create a HuggingFace account at https://huggingface.co
Accept the terms for pyannote/speaker-diarization-3.1
Generate an access token at https://huggingface.co/settings/tokens
Enter the token in Settings and enable diarization

Note: Diarization adds processing latency and requires additional model downloads.

System Audio Capture (Optional)

Capture both microphone AND system audio simultaneously. This enables transcribing video calls, podcasts, YouTube videos, or any audio playing on your Mac alongside your voice.

Step 1: Install BlackHole

brew install blackhole-2ch

Step 2: Create Multi-Output Device

Open Audio MIDI Setup (/Applications/Utilities/Audio MIDI Setup.app)
Click the + button at the bottom left → Create Multi-Output Device
Check both your speakers/headphones AND "BlackHole 2ch"
Right-click the new Multi-Output Device → Use This Device For Sound Output

This setup allows you to hear audio normally while BlackHole captures it for transcription.

Step 3: Enable in Settings

Open LocalTranslate Settings from the menu bar
Check "Include System Audio"
Select "BlackHole 2ch" from the System Audio Device dropdown
Save settings

Now when you record, both your microphone input and system audio will be captured and transcribed together.

Output Format

Transcriptions are saved with a header containing the date and session duration:

Markdown (.md):

# Transcription

**Date:** 2024-01-15 14:30:00

**Duration:** 00:00:15

---

Hello, this is a test recording. The transcription appears after recording stops.

With speaker diarization:

**Speaker 1:** Hello, this is a test recording. **Speaker 2:** Yes, I can see the transcription working.

Configuration File

Settings are stored in ~/.localtranslate/config.json. You can edit this file directly or use the Settings window.

Troubleshooting

Microphone not working

Ensure microphone permissions are granted in System Preferences > Security & Privacy > Privacy > Microphone
Check that the correct microphone is selected in Settings

Model download issues

First run requires internet connection to download Whisper model
Models are cached in ~/.cache/whisper/

Slow transcription

Try a smaller model (tiny or base)
Disable diarization if not needed
Close other resource-intensive applications

tkinter issues on macOS

If Settings window doesn't appear:

brew install python-tk

Dependencies

openai-whisper: Local speech recognition
sounddevice: Cross-platform audio capture
numpy: Audio data processing
rumps: macOS menu bar integration
torch: Neural network backend
pyannote.audio: Speaker diarization (optional feature)

Recent Changes

System Audio Capture (Feb 2026)

BlackHole integration: Capture system audio alongside microphone input using BlackHole as a virtual audio loopback device
Audio mixing: Mic and system audio streams are mixed together with normalization to prevent clipping
Settings UI: Added checkbox and device dropdown in the Settings window to enable and configure system audio capture
Documentation: Added setup instructions for installing BlackHole and creating a Multi-Output Device

Diarization & Audio Processing Fixes (Feb 2026)

Diarization Fixes:

Fixed timing alignment: Diarization now runs on the same processed audio as transcription, fixing speaker label misalignment issues
Updated pyannote 3.x API: Fixed compatibility with pyannote.audio 3.x which changed the output format from Annotation to DiarizeOutput
Fixed tensor format: Corrected waveform tensor shape from (batch, channel, time) to (channel, time) as required by pyannote
Improved error reporting: Added detailed logging showing detected speaker segments with timestamps, warnings when no segments found, and full tracebacks on errors
Fixed boundary matching: Speaker segment matching now uses exclusive end bounds to prevent double-matching at segment boundaries

Audio Processing Improvements:

Lowered high-pass filter: Changed from 80Hz to 60Hz to preserve male voice fundamentals (85-180Hz range)
Reduced noise gate threshold: Lowered from 0.01 to 0.005 RMS to avoid cutting quiet speech
Enabled pre-emphasis filter: Boosts high frequencies/consonants for improved speech recognition accuracy
Improved padding: Changed from zero-padding to edge-padding for short audio clips to avoid confusing Whisper

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
audio_processor.py		audio_processor.py
audio_recorder.py		audio_recorder.py
config.py		config.py
config_template.json		config_template.json
file_manager.py		file_manager.py
main.py		main.py
requirements.txt		requirements.txt
settings_window.py		settings_window.py
speaker_identifier.py		speaker_identifier.py
transcriber.py		transcriber.py
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalTranslate

Features

Requirements

Installation

Usage

Starting the App

Menu Bar Controls

Settings

Whisper Model Selection

Speaker Diarization (Optional)

System Audio Capture (Optional)

Step 1: Install BlackHole

Step 2: Create Multi-Output Device

Step 3: Enable in Settings

Output Format

Configuration File

Troubleshooting

Microphone not working

Model download issues

Slow transcription

tkinter issues on macOS

Dependencies

Recent Changes

System Audio Capture (Feb 2026)

Diarization & Audio Processing Fixes (Feb 2026)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LocalTranslate

Features

Requirements

Installation

Usage

Starting the App

Menu Bar Controls

Settings

Whisper Model Selection

Speaker Diarization (Optional)

System Audio Capture (Optional)

Step 1: Install BlackHole

Step 2: Create Multi-Output Device

Step 3: Enable in Settings

Output Format

Configuration File

Troubleshooting

Microphone not working

Model download issues

Slow transcription

tkinter issues on macOS

Dependencies

Recent Changes

System Audio Capture (Feb 2026)

Diarization & Audio Processing Fixes (Feb 2026)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages