A Node.js application that extracts audio from video files and transcribes them to Bulgarian text using OpenAI's Whisper model.
- Extract audio from video files (MP4, etc.)
- Convert audio to optimal format for transcription (WAV, 16kHz, mono)
- Transcribe audio to Bulgarian text using local Whisper installation
- Save transcription to text file
- Automatic cleanup of temporary files
- Node.js - Download from nodejs.org
- Python - Download from python.org
- FFmpeg is bundled automatically via
ffmpeg-static— no manual install needed
- FFmpeg is bundled automatically via
-
Install Python dependencies:
py -m pip install -r requirements.txt
-
Install Node.js dependencies:
npm install
-
Configure your environment:
cp .env.example .env
Then edit
.envand setINPUT_FOLDERto the folder containing your audio files.
All configuration is done via the .env file (copy from .env.example):
# Path to the folder containing your audio files
INPUT_FOLDER=some_path
# Whisper model size: tiny, base, small, medium, large
# Larger models are more accurate but slower
WHISPER_MODEL=largeRun against the folder configured in .env:
npm startOr pass the folder path directly as an argument (overrides .env):
node transcribeAudio.js "your_folder_with_files_here"-
Folder Scan: Reads all supported audio/video files from
INPUT_FOLDER- Supported formats:
.m4a,.mp3,.mp4,.wav,.ogg,.flac,.aac - Files that already have a matching
.txttranscript are skipped automatically
- Supported formats:
-
Audio Conversion: Uses FFmpeg (bundled via
ffmpeg-static) to convert each file to a temporary WAV- Mono channel, 16kHz sample rate — optimal for Whisper
- Temp file is deleted after transcription
-
Transcription: Calls
transcribe.pywhich runs the local Whisper model- Specifically configured for Bulgarian language
- Model size is configurable via
WHISPER_MODEL
-
Output: Each audio file gets a matching
.txttranscript saved in the same folder- e.g.
session1.m4a→session1.txt
- e.g.
bg-transcriber/
├── transcribeAudio.js # Main application file
├── transcribe.py # Python script for Whisper transcription
├── package.json # Node.js dependencies
├── requirements.txt # Python dependencies
├── .env.example # Environment variable template
├── .env # Your local config (not committed)
└── README.md # This file
# Transcripts are saved alongside the source audio files:
D:\folder\folder\
├── session1.m4a
├── session1.txt # generated transcript
├── session2.m4a
└── session2.txt # generated transcript
fluent-ffmpeg- FFmpeg wrapper for Node.jsffmpeg-static- Bundled FFmpeg binary (no system install required)dotenv- Environment variable management
openai-whisper- OpenAI's Whisper speech-to-text model
[*] Found 3 file(s) in: D:\folder\folder
── session1.m4a
[*] Converting: session1.m4a
[*] Transcribing...
[✔] Saved: session1.txt
── session2.m4a
[skip] session2.txt already exists
── session3.m4a
[*] Converting: session3.m4a
[*] Transcribing...
[✔] Saved: session3.txt
[✔] Done. 2 succeeded, 0 failed.
- Python not found: Ensure Python is installed and accessible via
pycommand - Whisper not installed: Run
py -m pip install -r requirements.txt - Folder not found: Verify
INPUT_FOLDERis set correctly in.env - No files found: Check that the folder contains supported audio formats (
.m4a,.mp3,.mp4, etc.) - Slow transcription: Try a smaller model (
WHISPER_MODEL=tinyorbase) in.env
For direct Whisper usage without Node.js:
whisper path\to\your\audio.wav --language bg --model smallISC License
Feel free to submit issues and enhancement requests!