A website for AI-based transcript and subtitle generation including a small user editor using
- Stable Whisper (which is a slight modification of OpenAI Whisper) for audio transcription,
- NVIDIA NeMo for speaker diarization,
- PANNs inferece for sound event detection,
- DeepL Translator for translation,
- FFmpeg for video/audio editing,
- Flask for webframework
- and Bootstrap for website CSS.
This application was developed as part of a bachelor thesis.
example_3.mp4
Video-URL: https://www.youtube.com/watch?v=L6yE7fUE220
More examples can be found here.
- Python 3.10.11 (Download here)
- Microsoft Visual C++ 14.0 or greater (Download here)
- FFmpeg 6.0 or greater (Download here)
- DeepL API Key (Create a free account here)
- Clone repository:
git clone https://github.com/philipp821/subtitle-generator.git
- Move into repository directory and create virtual environment:
python -m venv venv
- Activate virtual environment:
venv\Scripts\activate
- Install packages:
pip install Cython
pip install -r requirements.txt
python -m textblob.download_corpora lite
- Download pretrained model for sound event detection and store it in:
\data\configs\panns_inference\Cnn14_DecisionLevelMax_mAP=0.385.pth
- Put your DeepL API Key in a file named
deepl.keyand store it in root directory:
\deepl.key
- Activate virtual environment if not already done:
venv\Scripts\activate
- Run the
webserver.pyfile:
python src\webserver.py
- Enter
http://localhost:5000/in your browser if it does not open by itself.