Skip to content

Latest commit

 

History

History
40 lines (37 loc) · 10.1 KB

File metadata and controls

40 lines (37 loc) · 10.1 KB

Open Space Leaderboard (Land and Air Domain)

Evaluation using Short, Medium, and Long video scenarios, across reasoning types: temporal, spatial, and intent.

Difficulty Model Size Over. Avg. Short Avg. Short Temp. Short Spatial Short Intent Medium Avg. Medium Temp. Medium Spatial Medium Intent Long Avg. Long Temp. Long Spatial Long Intent
Hard GPT 4o 🥇 - 21.26 24.05 25.32 30.34 16.5 25.19 27.57 31.07 16.91 14.66 5.5 30.5 8
Gemini 1.5 pro - 20.55 23.2 23.88 23.2 22.5 21.61 26.66 19.04 19.12 16.84 6 25.5 19
InternVL2.5 26B 20.55 20.33 25 28.5 8.5 25.66 31 32 14 15.66 13 17 17
InternVL2.5 8B 20.45 19.34 19 30.5 8.5 24.66 31 30 13 17.34 10.5 31.5 10
InternVL2.5 4B 17.45 17 16 19 15 21 25 21 17 14.34 11.5 26 5.5
LLaVA Next 32B 17.05 19.67 15 33 11 14 9 22 11 17.5 7.5 35 10
LLaVA Video 7B 17.28 18 13 31.5 9.5 18.67 16 26 14 15.16 7.5 29 9
LLaVA OneVision 7B 14.67 15.16 8.5 27.5 9.5 15.34 15 17 14 13.5 8 23.5 9
Qwen2.5 VL 32B 19.44 19.66 8.5 35 15.5 25.33 25 24 27 13.33 2 28 10
Qwen2.5 VL 7B 19.72 22.66 8.5 30 29.5 22.66 21 31 16 13.84 3.5 30 8
Qwen2.5 VL 3B 19.05 19.5 12.5 32.5 13.5 20 22 26 12 17.67 10 29.5 13.5
Medium GPT 4o 🥇 - 37.72 42.08 43.24 55.5 27.5 31.95 39.84 30.34 25.66 39 44.5 37 35.5
Gemini 1.5 pro - 36.34 38.73 37.21 45 34 35.09 33.66 47.11 24.5 35.17 21 53.5 31
InternVL2.5 26B 31.89 33.66 33.5 54 13.5 30.67 31 43 18 31.34 27.5 42.5 24
InternVL2.5 8B 34.49 33.66 31.5 57.5 12 35 37 48 20 34.83 33 44.5 27
InternVL2.5 4B 33.05 34.5 33 48.5 22 33.34 37 41 22 31.33 25.5 43 25.5
LLaVA Next 32B 23.05 26 17 44.5 16.5 18 16 25 13 25.16 20.5 38 17
LLaVA Video 7B 24.84 25.16 22 35 21 24.34 26 27 20 25 14.5 42.5 18
LLaVA OneVision 7B 20.17 19.66 23 32 16 18.67 19 20 17 22.16 16 32.5 18
Qwen2.5 VL 32B 30.95 30.5 16.5 46 29 32 31 40 25 30.34 14 50 27
Qwen2.5 VL 7B 28.95 31.84 26.5 33 36 28.34 28 33 24 26.66 25.5 23 31.5
Qwen2.5 VL 3B 29.73 32.17 25.5 44 27 25.67 24 37 16 31.34 31 35.5 27.5
Easy GPT 4o - 41.42 43.84 44.5 37.53 49.5 41.91 39.45 41.45 44.84 38.5 44.5 27.5 43.5
Gemini 1.5 pro - 44.5 48.33 48 47 50 39.46 48.51 34.36 35.5 45.84 46.5 47 44
InternVL2.5 26B 44.33 48.16 49 51.5 44 40 43 45 32 44.83 46 51 37.5
InternVL2.5 8B 44.27 46.17 41.5 53 44 40 45 42 33 46.66 57 52 31
InternVL2.5 4B 42.61 48.33 44 55 46 38.33 39 41 35 41.16 39.5 54 30
LLaVA Next 32B 32.23 37.34 35.5 43.5 33 26.33 24 23 32 33.17 27.5 40 32
LLaVA Video 7B 32.33 33.16 32 34.5 33 34 36 37 29 29.84 25.5 31 33
LLaVA OneVision 7B 31.5 32.66 32.5 35.5 30 29.34 30 34 24 32.5 31.5 33 33
Qwen2.5 VL 🥇 32B 47.84 50.5 46 53 52.5 46 43 46 49 47 43.5 52 45.5
Qwen2.5 VL 7B 40.28 42.33 41.5 30 55.5 37 40 29 42 41.5 44.5 29 51
Qwen2.5 VL 3B 42.84 46.33 37 53.5 48.5 38.67 40 44 32 43.5 38 52 40.5