This project transcribes audio files using onnx_asr library and uses multiple GPU's to batch process audio transcriptions into .srt files. I wanted to simplify the process of running this locally so anyone can experiment with these libraries. Tested on 2x Nvidia 24GB GPU's but you can specify how many GPU's you want to use and how many files you want to process on each GPU at a time. It works inside a Conda virtual environment so the correct cuda environment is easy to load and switch out of. This allows for your system level environment to stay clean and you can use different verions of CUDA and other Nvidia libraries in other projects on the same machine.
- Linux
- NVIDIA container toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
- Miniconda
- FFMPEG
Put the audio files you want to use in the /files folder and run this script within the /files folder. It currently converts .mp3 files but modify as needed.
for f in *.mp3; do ffmpeg -i "$f" -ar 16000 -ac 1 -c:a pcm_s16le "${f%.*}.wav"; done
conda create -n onnx_env python=3.12
conda activate onnx_env
conda install cuda=12.8 -c nvidia/label/cuda-12.8.1
conda install nvidia::cudnn cuda-version=12.8
pip install onnx-asr onnxruntime-gpu huggingface_hub soundfile
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
conda activate onnx_env
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
python onnx.py
python onnx-single.py
python onnx-max.py
- onnx.py - runs transcriptions on mutiple gpu's
- onnx-single.py - runs transcriptions on a single gpu
- onnx-max.py - runs mutiple transcriptions at once on mutiple gpu's
- files/ - folder with audio files to process
- models/ - folder with onnx models
- run as subprocess
import subprocess def transcribe_with_subprocess(audio_file): # This script contains only the loading, transcribing, and saving code subprocess.run(["python", "transcribe_script.py", "--audio", audio_file]) # Memory is freed when the script finishes
Tested on processing 118 wav files at 9.5gb in size
| Script | Processing Time | Single File Average | Failures |
|---|---|---|---|
| onnx-max.py | 10m 41s | 5.4s | 5 (from maxing out memory) |
| onnx.py | 15m 42s | 7.9s | 0 |
| onnx-single.py | 25m 61s | 13s | 0 |
- Split out files that take more than 50% of VRAM so onnx-max can run without any errors and process those edge case files differently