-
Notifications
You must be signed in to change notification settings - Fork 11
Nan probelm and dimension probelm #14
Description
Thanks for sharing your great work.
But when I try to run the model, some problem has occurred. Do you have any cur to solve the problem.
When I first run the model, I encountered problem like the picture below
After debug, I find problem occurred in
def visval_encode(self, event_tensor): with torch.no_grad(): outputs = self.get_model().visual_tower.visual_tower(event_tensor) events_feature = outputs.last_hidden_state events_feature = events_feature.detach().requires_grad_(True) events_feature = self.get_model().visual_projector(events_feature) return events_feature
When I print the output, I find visual_tower's output is Nan, and CLIPVisionModel's output dimension is 1024, but feature_adaptor expects 4096
I followed your guide in README, but im not sure everything I do was correct, could you give me help.
Here's My config, im not sure config about inference model and clip was suitable
"mm_visual_tower": "/home/xxx/EventGPT/clip-vit-large-patch14-336"
python ./inference.py \ --model_path "/home/xxx/EventGPT/checkpoints/EventGPT" \ --event_frame "/home/xxx/EventGPT/samples/sample1.npy" \ --query "Describe in detail what happened in the scene." \ --temperature "0.4" \ --top_p "1"
Any suggestions would be greatly appreciated! Thank you in advance for your time and support.