Nan probelm and dimension probelm

Thanks for sharing your great work.
But when I try to run the model, some problem has occurred. Do you have any cur to solve the problem.
When I first run the model, I encountered problem like the picture below

<img width="1182" height="315" alt="Image" src="https://github.com/user-attachments/assets/22ca97b2-b8a5-4559-b8cd-d7804a2c4330" />

After debug, I find problem occurred in 

`    def visval_encode(self, event_tensor):
        with torch.no_grad():
            outputs = self.get_model().visual_tower.visual_tower(event_tensor)
        events_feature = outputs.last_hidden_state
        events_feature = events_feature.detach().requires_grad_(True)
        events_feature = self.get_model().visual_projector(events_feature)
        return events_feature`

When I print the output, I find visual_tower's output is Nan, and CLIPVisionModel's output dimension is 1024, but feature_adaptor expects 4096
I followed your guide in README, but im not sure everything I do was correct, could you give me help.
Here's My config, im not sure config about inference model and clip was suitable 
` "mm_visual_tower": "/home/xxx/EventGPT/clip-vit-large-patch14-336"`
`python ./inference.py \
    --model_path "/home/xxx/EventGPT/checkpoints/EventGPT" \
    --event_frame "/home/xxx/EventGPT/samples/sample1.npy" \
    --query "Describe in detail what happened in the scene." \
    --temperature "0.4" \
    --top_p "1"`

Any suggestions would be greatly appreciated! Thank you in advance for your time and support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nan probelm and dimension probelm #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Nan probelm and dimension probelm #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions