Privacy-first, AI-powered meeting transcription with speaker diarization, completely offline!
Transform your meetings into searchable, speaker-labeled transcripts without sending any data to the cloud. This tool combines advanced speech recognition with speaker diarization to create professional meeting records.
- 100% Offline - No internet required, complete privacy
- Dual Audio Capture - Records both microphone and system audio
- Speaker Diarization - Automatically identifies different speakers
- Multi-language Support - Configurable language detection
- Cross-platform - Works on Windows and Linux
- Fast Processing - Optimized for CPU inference
- Multiple Output Formats - Plain text and speaker-labeled transcripts
- Easy Setup - Simple CLI interface
- Python 3.8 or higher
- Git LFS (for downloading models)
- Audio input device (microphone)
- ~2GB free disk space for models
Linux/Mac:
git clone https://github.com/whitehole07/offline-meeting-transcriber.git
cd offline-meeting-transcriber
chmod +x install.sh
./install.shWindows:
git clone https://github.com/whitehole07/offline-meeting-transcriber.git
cd offline-meeting-transcriber
install.batThe installation script will automatically:
- Create a virtual environment
- Install all Python dependencies
- Download Whisper model (~1.5GB) via git LFS
- Download SpeechBrain speaker model (~100MB) via git LFS
- Set up the complete environment
- Test the installation
If you prefer to install manually:
-
Clone the repository
git clone https://github.com/whitehole07/offline-meeting-transcriber.git cd offline-meeting-transcriber -
Create virtual environment
python -m venv venv # Windows venv\Scripts\activate # Linux/Mac source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Download models
- Whisper Model:
Systran/faster-whisper-medium(~1.5GB)- Speech-to-text transcription model
- Optimized for CPU inference
- Supports multiple languages
- SpeechBrain Model:
speechbrain/spkrec-ecapa-voxceleb(~100MB)- Speaker diarization model
- ECAPA-TDNN architecture
- Trained on VoxCeleb dataset
- Whisper Model:
If you need to download models manually:
Whisper Model (Systran/faster-whisper-medium):
git clone https://huggingface.co/Systran/faster-whisper-medium models/faster-whisper-medium- Purpose: Converts speech to text
- Architecture: Transformer-based encoder-decoder
- Size: ~1.5GB
- Languages: 99 languages supported
- Optimization: CPU-optimized version of OpenAI Whisper
SpeechBrain Model (speechbrain/spkrec-ecapa-voxceleb):
git clone https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb models/spkrec-ecapa-voxceleb- Purpose: Identifies different speakers in audio
- Architecture: ECAPA-TDNN (Extended Context Aggregation)
- Size: ~100MB
- Training: VoxCeleb dataset (1M+ utterances from 7K+ speakers)
- Features: Speaker embedding extraction and clustering
# Record with microphone + system audio
python cli.py start
# Record system audio only (no microphone)
python cli.py start --no-micPress Ctrl+C to stop recording and automatically process the audio.
After processing, you'll find these files in output/YYYYMMDD/:
recording_HHMMSS.wav- Original audio recordingtranscription_HHMMSS.txt- Plain text transcriptiondiarized_HHMMSS.txt- Speaker-labeled transcription
Plain Transcription:
When you become fascist or communist or anarchist in those years, you can also simply be someone who never reasons...
Diarized Transcription:
SPEAKER_01 (00:00-01:10): When you become fascist or communist or anarchist in those years, you can also simply be someone who never reasons and therefore it's true yes I go with my friends and beat up those others...
SPEAKER_00 (01:10-02:14): We're not doing that well, no we're not doing that well guys, if it goes well you make me laugh...
Edit config.py to customize:
WHISPER_MODEL = "medium" # base, small, medium, large
WHISPER_LANGUAGE = "en" # Language codeModel download fails:
python -c "from faster_whisper import WhisperModel; WhisperModel('medium')"Poor transcription quality:
- Ensure clear audio input
- Check microphone levels
- Try different Whisper model sizes
- Verify language setting matches audio
Recording doesn't stop:
- Press
Ctrl+Cto stop recording and process audio - The system will automatically transcribe and diarize after stopping
Installation issues:
- Make sure you have Python 3.8+ installed
- Ensure you have Git LFS installed (required for model downloads)
- Linux/Mac:
git lfs install(after installing git-lfs package) - Windows: Install Git LFS from https://git-lfs.github.io
- Linux/Mac:
- Ensure you have internet connection for model downloads
- If models fail to download, try running the installation script again
- Check that you have sufficient disk space (~2GB for models)
- On Windows, make sure you're running the batch file as administrator if needed
- If git clone fails, try:
git lfs pullin the model directories
Windows:
- Requires
pyaudiowpatchfor system audio capture - WASAPI loopback support needed
- May need audio driver updates
Linux:
- Requires PulseAudio or PipeWire
- May need additional audio packages
- Check device permissions
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper - Speech recognition
- SpeechBrain - Speaker diarization
- faster-whisper - Optimized Whisper implementation
- PyAudio - Audio I/O
Found a bug? Open an issue
Have a feature request? Start a discussion