This is the official repository of the paper "Audio Visual Segmentation Through Text Embeddings"
conda create -n avt python=3.9.20
conda activate avt
pip install -r requirements.txt- Follow the guidance of https://github.com/OpenNLPLab/AVSBench to download the AVSBench dataset.
- Modify path root variables in utils/config_m3.py and utils/config_s4.py.
- Place all weights from here to pretrained directory.
python train_avs.py --evf_version evf_sam2 --projector_type mul --use_adapter --dataset s4 --batch_size 8python train_avs.py --evf_version evf_sam2 --projector_type mul --use_adapter --dataset m3 --batch_size 8Replace --name and --weight_path to appropriate name andm weight_path.
python test_avs.py --dataset s4 --name name --evf_version evf_sam2 --projector_type mul --use_adapter --adapter_type mul --weight_path weightpython test_avs.py --dataset m3 --name name --evf_version evf_sam2 --projector_type mul --use_adapter --adapter_type mul --weight_path weight