diff --git a/README.md b/README.md index 2d62153..3d307be 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ This repository contains the code and data associated with our ICML 2025 paper, |----------------------|:-------------:|:---------------:|:-----------------------:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | GPT-4.1 🆕 | 1M | 16K | 97.0 (82.5) | 95.6 | 95.2 | 91.7 | 87.5 | 84.9 | 79.8 | 69.7 | 64.7 | | GPT-4o | 128K | 8K | 99.3 (84.4) | 98.1 | 98.0 | 95.7 | 89.2 | 81.6 | 69.7 | 62.4 | 56.0 | +| MiniCPM-SALA 🆕 | 1M | 4K | 98.2 (83.5) | 92.8 | 91.4 | 88.9 | 81.5 | 71.3 | 59.7 | 53.2 | *40.9* | | Llama 3.3 70B | 128K | 2K | 97.3 (82.7) | 94.2 | 87.4 | 81.5 | 72.1 | 59.5 | *42.7* | -- | -- | | Llama 3.1 405B | 128K | 2K | 94.7 (80.5) | 89.0 | 85.0 | 74.5 | 60.1 | 48.4 | *38.0* | -- | -- | | Llama 3.1 70B | 128K | 2K | 94.5 (80.3) | 91.0 | 81.8 | 71.2 | 62.7 | 51.8 | *43.2* | -- | -- | @@ -54,6 +55,7 @@ Gemini 2.5 Pro and Gemini 2.5 Flash (w/ Thinking) results are included in the No | - w/ CoT | 97.1 | 73.0 | 51.2 | *31.8* | *10.1* | | **Reasoning Models** | | | | | | | GPT-o3 🆕 | 100.0 | 94.4 | 86.2 | 74.9 | 58.5 | +| MiniCPM-SALA 🆕 | 98.2 | 90.3 | 84.6 | 77.5 | 69.9 | | Gemini 2.5 Pro 🆕 | 99.1 | 73.9 | 63.0 | 58.6 | 58.6 | | GPT-o1 | 99.9 | 92.0 | 78.0 | 60.1 | *31.1* | | DeepSeek R1-Distill-Llama-70B | 99.9 | 91.4 | 75.5 | *49.4* | *20.7* |