This repo is the code of the Spatio-temporal action detecion project which implement the tubelet-level transformer-based spatio-temporal action detection model--TubeR.
This project will train and evaluate TubeR models on AVA2.1 and JHMDB datasets. For more details of the projects, check out the dissertation and the intro video. The project will continue to be updated on github.
| Dataset | Backbone | Pre-train | #view | IoU | mAP |
|---|---|---|---|---|---|
| AVA2.1 | CSN-152 | Kinetics-400+IG65M | 1 view | 0.5 | 0.29 |
| JHMDB-21 | CSN-152 | Kinetics-400+IG65M | 1 view | 0.5 | 0.72 |
The project is tested working on:
- Python 3.7.12
- Torch 1.12.1(initial) and Torch 2.0.0(updated)
- CUDA 12.1
- timm==0.4.5
- tensorboardX
To run the code of this project, you should download following file from github or given resource, since the code limit 250M on moodle. Here is the github link: https://github.com/YihangChen9/Spatio_temporal-action-detection
-
Please download the asset.zip(annotation of the AVA dataset) from the google drive. Then unzip into datasets/ directory
-
Please download the pre-train.zip(trained model) directory from the the google drive. then put under tuber directory.
TubeR_CSN152_JHMDB.pth is the trained model for JHMDB
TubeR_CSN152_AVA.pth is the trained model for JHMDB
irCSN_152_ft_kinetics_from_ig65m_f126851907.mat is the pre-train model of CSN152 as backbone
detr.pth initial transformer parameter setting for AVA TubeR -
You can get JHMDB in this link, please download JHMDB.tar.gz. JHMDB-GT.pkl is the annotaiton file.
-
To get AVA dataset, please run the three bash scripts in the datasets directory one by one(please set the path in bash file):
download_ava.sh(down the original video clip)
chunk_video.sh(chunk the clip from 15 min to 30 min)
extract_frame.sh(extract frame from the clip)
To evaluate the model, first modify the config file(in configuration,one for AVA, another one for JHMDB):
- set the correct
WORLD_SIZE,GPU_WORLD_SIZE,DIST_URL,WOLRD_URLSbased on experiment setup. - set the
LABEL_PATH,ANNO_PATH,DATA_PATHto your local directory accordingly. - Download the pre-trained model and set
PRETRAINED_PATHto model path. - make sure
LOADandLOAD_FCare set to True
Then set path to tuber and run:
# run evaluation
python3 eval_tuber_jhmdb.py
python3 eval_tuber_ava.py
To train TubeR from scratch, first modify the configfile:
- set the correct
WORLD_SIZE,GPU_WORLD_SIZE,DIST_URL,WOLRD_URLSbased on experiment setup. - set the
LABEL_PATH,ANNO_PATH,DATA_PATHto your local directory accordingly. - Download the pre-trained feature backbone and transformer weights and set
PRETRAIN_BACKBONE_DIR,PRETRAIN_TRANSFORMER_DIR(only for AVA dataset) accordingly. - make sure
LOADandLOAD_FCare set to False
Then run:
# run training from scratch
python3 train_tuber_jhmdb.py
python3 train_tuber_ava.py
To Test, first modify the config file(in configuration,one for AVA, another one for JHMDB):
- set the correct
WORLD_SIZE,GPU_WORLD_SIZE,DIST_URL,WOLRD_URLSbased on experiment setup. - set the
ANNO_PATH,DATA_PATH(video/JHMDB/test_frames) andTEST_PATH(path of the input video. put only one avi or mp4 video,please) to your local directory accordingly. - Download the pre-trained model and set
PRETRAINED_PATHto model path. - make sure
LOADandLOAD_FCare set to True
Then set path to tuber and run:
# run evaluation
python3 detection_system.py