- Get the data from https://www.dropbox.com/sh/x3zpttp7bjevb3r/AAAeFLnIeBMBXa9DNQD4a8TOa?e=2&dl=0 and put the content inside
data/raw/NSVA_Data/NSVA_Data. - Run the
webscraper.py - Take note the games have been selected to just 'dal' games. Refers to 'dallas mavericks'.
- Run
pip install -r requirements.txt. - Create a file .env and place it under
/src, it must containOPENAI_API_KEY=<API_KEY>. - Place the video files extracted using the webscraper and place them in the directory
/data/raw/NSVA_Video/ - Place the webscraper generated metadata file under
/data/processed/final_results_{game}.csv - The output commentary will be placed in the folder
/data/text/GPT4o/{game}_commentary_results.csvNote:{game}refers to the game_id which is available in the NSVA file, Eg: 0021800013-dal-vs-phx.
- Run
pip install -r requirements.txt. - Determine the number of action recognition classes (current used based on dataset is 5)
- Finetune based on the specified hyperparams (e.g. Learning rate, epochs, etc)
- Default finetune by unfreezing last 3 layers and used standard Cross Entropy Loss function.
- go to
notebooks/inference.ipynb - make your own
generate_captionsmodel - On last cell of the notebook, run the for loop and make the minor changes as needed / video data used can be found in gdrive link here
- Install required library
pip install langchain-openai==0.1.7or just runpip install -r requirements.txt. - Since the model is from local, follow install instruction for LM Studio / LocalAI.
- Run
python src/text_personification.pyfrom the root directory (or anywhere, doesn't matter). - The script will output both personified text and token usage
- Download the models from https://huggingface.co/enlyth/baj-tts/tree/main/models and put it inside the
modelsdirectory - Install the requirements,
pip install TTS==0.22.0or just runpip install -r requirements.txt. - Run
python src/tts.pyfrom the root directory. - Check the generated
.wavfile inoutputdirectory.
This is the main program. The models will run given captions of the videos, and will return a .wav file for the output personified caption.
- Run
python main.pyfrom the root directory.
