Thank you for sharing your great work!
Your scripts provide evaluation for LLaVA-OV with VisionZip. But I wonder whether I can apply VisionZip to LLaVA-Video. For example, I can fix
WRAPPER=visionzip SPATIAL_TOKENS=20 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
accelerate launch --num_processes=8 --main_process_port=25000 \
-m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=32 \
into
WRAPPER=visionzip SPATIAL_TOKENS=20 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
accelerate launch --num_processes=8 --main_process_port=25000 \
-m lmms_eval \
--model llava_vid\
--model_args pretrained=lmms-lab/LLaVA-Video-7B-Qwen2,conv_template=qwen_1_5,mm_spatial_pool_mode=average,max_frames_num=64 \
Thank you for sharing your great work!
Your scripts provide evaluation for LLaVA-OV with VisionZip. But I wonder whether I can apply VisionZip to LLaVA-Video. For example, I can fix
into