-
Notifications
You must be signed in to change notification settings - Fork 109
Description
Thank you very much for your work. When I run main.py with the Qwen-VL2 model, I encountered the following error:
(openemma-env) root@gpu-node1:OpenEMMA-main# python main.py
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet18_Weights.IMAGENET1K_V1. You can also use weights=ResNet18_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
set VIDEO_TOTAL_PIXELS: 90316800
qwen
Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/cuda/init.py:155: UserWarning:
NVIDIA H800 with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H800 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00, 5.07s/it]
Loading NuScenes tables for version v1.0-trainval...
Loading nuScenes-lidarseg...
32 category,
8 attribute,
4 visibility,
64386 instance,
12 sensor,
10200 calibrated_sensor,
2631083 ego_pose,
68 log,
850 scene,
34149 sample,
2631083 sample_data,
1166187 sample_annotation,
4 map,
34149 lidarseg,
Done loading in 27.960 seconds.
Reverse indexing ...
Done reverse indexing in 6.2 seconds.
Number of scenes: 850
Scene scene-0103 has 40 frames
Created a temporary directory at /tmp/tmpmjfgbonx
Writing /tmp/tmpmjfgbonx/_remote_module_non_scriptable.py
Traceback (most recent call last):
File "main.py", line 420, in
updated_intent) = GenerateMotion(obs_images, obs_ego_traj_world, obs_ego_velocities,
File "main.py", line 204, in GenerateMotion
scene_description = SceneDescription(obs_images, processor=processor, model=model, tokenizer=tokenizer, args=args)
File "main.py", line 168, in SceneDescription
result = vlm_inference(text=prompt, images=obs_images, processor=processor, model=model, tokenizer=tokenizer, args=args)
File "main.py", line 86, in vlm_inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/generation/utils.py", line 2215, in generate
result = self._sample(
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/generation/utils.py", line 3206, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1686, in forward
image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/accelerate/hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1041, in forward
rotary_pos_emb = self.rot_pos_emb(grid_thw)
File "/opt/conda/envs/openemma-env/lib/python3.8/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1014, in rot_pos_emb
hpos_ids = hpos_ids.reshape(
RuntimeError: shape '[0, 2, 0, 2]' is invalid for input of size 7296