Skip to content

Nan probelm and dimension probelm #14

@noincapableofaction

Description

@noincapableofaction

Thanks for sharing your great work.
But when I try to run the model, some problem has occurred. Do you have any cur to solve the problem.
When I first run the model, I encountered problem like the picture below

Image

After debug, I find problem occurred in

def visval_encode(self, event_tensor): with torch.no_grad(): outputs = self.get_model().visual_tower.visual_tower(event_tensor) events_feature = outputs.last_hidden_state events_feature = events_feature.detach().requires_grad_(True) events_feature = self.get_model().visual_projector(events_feature) return events_feature

When I print the output, I find visual_tower's output is Nan, and CLIPVisionModel's output dimension is 1024, but feature_adaptor expects 4096
I followed your guide in README, but im not sure everything I do was correct, could you give me help.
Here's My config, im not sure config about inference model and clip was suitable
"mm_visual_tower": "/home/xxx/EventGPT/clip-vit-large-patch14-336"
python ./inference.py \ --model_path "/home/xxx/EventGPT/checkpoints/EventGPT" \ --event_frame "/home/xxx/EventGPT/samples/sample1.npy" \ --query "Describe in detail what happened in the scene." \ --temperature "0.4" \ --top_p "1"

Any suggestions would be greatly appreciated! Thank you in advance for your time and support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions