Skip to content

A command-line tool for transcribing audio files in a folder to a metadata.csv file, using OpenAI's Whisper.

License

Notifications You must be signed in to change notification settings

veralvx/trainscribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trainscribe

Trainscribe is a command-line tool that transcribes audio files in a specified folder using OpenAI's Whisper and generates a metadata.csv file. The produced metadata file is intended to use in training/finetune of text to speech (TTS) models, and may use one of the following formats:

  • file_id|transcribed_text, or
  • file_id|transcribed_text|speaker, if a speaker label is provided.

This is similar to LJ Speech format, but lacks an additional field with normalized transcribed text for pronuciation. Particularly, file_id|transcribed_text may be used in projects like piper-train, and file_id|transcribed_text|speaker in xtts-finetune.

Requirements

  • Python >=3.10, <3.14
  • uv
  • ffmpeg (install with sudo apt install ffmpeg)

Usage

Run the tool with:

uvx trainscribe --folder /path/to/audio/folder [options]
Transcribe a folder of audio files to metadata.csv using Whisper.

options:
  -h, --help            show this help message and exit
  --folder, -f FOLDER   Folder with audio files
  --lang, -l LANG       Language code for transcription (e.g. 'en')
  --model, -m MODEL     Whisper model name (tiny, base, small, medium, large, turbo)
  --speaker, -s SPEAKER
                        Speaker label to add to metadata lines
  --device, -d DEVICE   Device for whisper model (cuda/cpu)
  --output, -o OUTPUT

Example

Transcribe English audio in dataset/wavs using the medium model:

uvx trainscribe --folder dataset/wavs --lang en --model medium 

This generates dataset/wavs/metadata.csv