Skip to content

Question: Can VisionZip be applied to LLaVA-Video models? #15

@arsoyul

Description

@arsoyul

Thank you for sharing your great work!

Your scripts provide evaluation for LLaVA-OV with VisionZip. But I wonder whether I can apply VisionZip to LLaVA-Video. For example, I can fix

WRAPPER=visionzip SPATIAL_TOKENS=20 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
accelerate launch --num_processes=8 --main_process_port=25000 \
-m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=32 \

into

WRAPPER=visionzip SPATIAL_TOKENS=20 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
accelerate launch --num_processes=8 --main_process_port=25000 \
-m lmms_eval \
--model llava_vid\
--model_args pretrained=lmms-lab/LLaVA-Video-7B-Qwen2,conv_template=qwen_1_5,mm_spatial_pool_mode=average,max_frames_num=64 \

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions