From 7dd27ff33a3162df58ddbd2c5aaf1a8e97e2e405 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=9E=97=E5=BC=BC=E8=BF=9C?= Date: Fri, 6 Mar 2026 23:51:43 +0800 Subject: [PATCH 1/2] Add MiniCPM-SALA results to leaderboard Made-with: Cursor --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2d62153..6f62d6c 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ This repository contains the code and data associated with our ICML 2025 paper, |----------------------|:-------------:|:---------------:|:-----------------------:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | GPT-4.1 🆕 | 1M | 16K | 97.0 (82.5) | 95.6 | 95.2 | 91.7 | 87.5 | 84.9 | 79.8 | 69.7 | 64.7 | | GPT-4o | 128K | 8K | 99.3 (84.4) | 98.1 | 98.0 | 95.7 | 89.2 | 81.6 | 69.7 | 62.4 | 56.0 | +| MiniCPM-SALA 🆕 | 1M | 4K | 98.5 (83.7) | 92.8 | 91.0 | 89.2 | 80.2 | 68.5 | 54.0 | *41.7* | *21.3* | | Llama 3.3 70B | 128K | 2K | 97.3 (82.7) | 94.2 | 87.4 | 81.5 | 72.1 | 59.5 | *42.7* | -- | -- | | Llama 3.1 405B | 128K | 2K | 94.7 (80.5) | 89.0 | 85.0 | 74.5 | 60.1 | 48.4 | *38.0* | -- | -- | | Llama 3.1 70B | 128K | 2K | 94.5 (80.3) | 91.0 | 81.8 | 71.2 | 62.7 | 51.8 | *43.2* | -- | -- | From 9ebdaaf34f1ac2425f1edf95042c8c3af866e5c9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=9E=97=E5=BC=BC=E8=BF=9C?= Date: Fri, 13 Mar 2026 17:05:30 +0800 Subject: [PATCH 2/2] Update MiniCPM-SALA results and add reasoning model entry Made-with: Cursor --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6f62d6c..3d307be 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ This repository contains the code and data associated with our ICML 2025 paper, |----------------------|:-------------:|:---------------:|:-----------------------:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | GPT-4.1 🆕 | 1M | 16K | 97.0 (82.5) | 95.6 | 95.2 | 91.7 | 87.5 | 84.9 | 79.8 | 69.7 | 64.7 | | GPT-4o | 128K | 8K | 99.3 (84.4) | 98.1 | 98.0 | 95.7 | 89.2 | 81.6 | 69.7 | 62.4 | 56.0 | -| MiniCPM-SALA 🆕 | 1M | 4K | 98.5 (83.7) | 92.8 | 91.0 | 89.2 | 80.2 | 68.5 | 54.0 | *41.7* | *21.3* | +| MiniCPM-SALA 🆕 | 1M | 4K | 98.2 (83.5) | 92.8 | 91.4 | 88.9 | 81.5 | 71.3 | 59.7 | 53.2 | *40.9* | | Llama 3.3 70B | 128K | 2K | 97.3 (82.7) | 94.2 | 87.4 | 81.5 | 72.1 | 59.5 | *42.7* | -- | -- | | Llama 3.1 405B | 128K | 2K | 94.7 (80.5) | 89.0 | 85.0 | 74.5 | 60.1 | 48.4 | *38.0* | -- | -- | | Llama 3.1 70B | 128K | 2K | 94.5 (80.3) | 91.0 | 81.8 | 71.2 | 62.7 | 51.8 | *43.2* | -- | -- | @@ -55,6 +55,7 @@ Gemini 2.5 Pro and Gemini 2.5 Flash (w/ Thinking) results are included in the No | - w/ CoT | 97.1 | 73.0 | 51.2 | *31.8* | *10.1* | | **Reasoning Models** | | | | | | | GPT-o3 🆕 | 100.0 | 94.4 | 86.2 | 74.9 | 58.5 | +| MiniCPM-SALA 🆕 | 98.2 | 90.3 | 84.6 | 77.5 | 69.9 | | Gemini 2.5 Pro 🆕 | 99.1 | 73.9 | 63.0 | 58.6 | 58.6 | | GPT-o1 | 99.9 | 92.0 | 78.0 | 60.1 | *31.1* | | DeepSeek R1-Distill-Llama-70B | 99.9 | 91.4 | 75.5 | *49.4* | *20.7* |