RuntimeError: shape '[0, 2, 0, 2]' is invalid for input of size 7296

Thank you very much for your work. When I run main.py with the Qwen-VL2 model, I encountered the following error：
(openemma-env) root@gpu-node1:OpenEMMA-main# python main.py 
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
set VIDEO_TOTAL_PIXELS: 90316800
qwen
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/cuda/__init__.py:155: UserWarning: 
NVIDIA H800 with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H800 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00,  5.07s/it]
======
Loading NuScenes tables for version v1.0-trainval...
Loading nuScenes-lidarseg...
32 category,
8 attribute,
4 visibility,
64386 instance,
12 sensor,
10200 calibrated_sensor,
2631083 ego_pose,
68 log,
850 scene,
34149 sample,
2631083 sample_data,
1166187 sample_annotation,
4 map,
34149 lidarseg,
Done loading in 27.960 seconds.
======
Reverse indexing ...
Done reverse indexing in 6.2 seconds.
======
Number of scenes: 850
Scene scene-0103 has 40 frames
Created a temporary directory at /tmp/tmpmjfgbonx
Writing /tmp/tmpmjfgbonx/_remote_module_non_scriptable.py
Traceback (most recent call last):
  File "main.py", line 420, in <module>
    updated_intent) = GenerateMotion(obs_images, obs_ego_traj_world, obs_ego_velocities,
  File "main.py", line 204, in GenerateMotion
    scene_description = SceneDescription(obs_images, processor=processor, model=model, tokenizer=tokenizer, args=args)
  File "main.py", line 168, in SceneDescription
    result = vlm_inference(text=prompt, images=obs_images, processor=processor, model=model, tokenizer=tokenizer, args=args)
  File "main.py", line 86, in vlm_inference
    generated_ids = model.generate(**inputs, max_new_tokens=128)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/generation/utils.py", line 2215, in generate
    result = self._sample(
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/generation/utils.py", line 3206, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1686, in forward
    image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1041, in forward
    rotary_pos_emb = self.rot_pos_emb(grid_thw)
  File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1014, in rot_pos_emb
    hpos_ids = hpos_ids.reshape(
RuntimeError: shape '[0, 2, 0, 2]' is invalid for input of size 7296

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: shape '[0, 2, 0, 2]' is invalid for input of size 7296 #33

Reverse indexing ...
Done reverse indexing in 6.2 seconds.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError: shape '[0, 2, 0, 2]' is invalid for input of size 7296 #33

Description

Reverse indexing ... Done reverse indexing in 6.2 seconds.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Reverse indexing ...
Done reverse indexing in 6.2 seconds.