Skip to content

RuntimeError: shape '[0, 2, 0, 2]' is invalid for input of size 7296 #33

@EasonChaozhou

Description

@EasonChaozhou

Thank you very much for your work. When I run main.py with the Qwen-VL2 model, I encountered the following error:
(openemma-env) root@gpu-node1:OpenEMMA-main# python main.py
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet18_Weights.IMAGENET1K_V1. You can also use weights=ResNet18_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
set VIDEO_TOTAL_PIXELS: 90316800
qwen
Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/cuda/init.py:155: UserWarning:
NVIDIA H800 with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H800 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00, 5.07s/it]

Loading NuScenes tables for version v1.0-trainval...
Loading nuScenes-lidarseg...
32 category,
8 attribute,
4 visibility,
64386 instance,
12 sensor,
10200 calibrated_sensor,
2631083 ego_pose,
68 log,
850 scene,
34149 sample,
2631083 sample_data,
1166187 sample_annotation,
4 map,
34149 lidarseg,
Done loading in 27.960 seconds.

Reverse indexing ...
Done reverse indexing in 6.2 seconds.

Number of scenes: 850
Scene scene-0103 has 40 frames
Created a temporary directory at /tmp/tmpmjfgbonx
Writing /tmp/tmpmjfgbonx/_remote_module_non_scriptable.py
Traceback (most recent call last):
File "main.py", line 420, in
updated_intent) = GenerateMotion(obs_images, obs_ego_traj_world, obs_ego_velocities,
File "main.py", line 204, in GenerateMotion
scene_description = SceneDescription(obs_images, processor=processor, model=model, tokenizer=tokenizer, args=args)
File "main.py", line 168, in SceneDescription
result = vlm_inference(text=prompt, images=obs_images, processor=processor, model=model, tokenizer=tokenizer, args=args)
File "main.py", line 86, in vlm_inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/generation/utils.py", line 2215, in generate
result = self._sample(
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/generation/utils.py", line 3206, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1686, in forward
image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1041, in forward
rotary_pos_emb = self.rot_pos_emb(grid_thw)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1014, in rot_pos_emb
hpos_ids = hpos_ids.reshape(
RuntimeError: shape '[0, 2, 0, 2]' is invalid for input of size 7296

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions