【Test-Cases】: add UT coverage for core datasets (aime, realworldqa, math, gsm8k, gpqa, dapo_math) by wanlongze · Pull Request #315 · AISBench/benchmark

wanlongze · 2026-05-30T09:19:18Z

概述

补充核心数据集的单元测试，提升 UT 覆盖率。新增约 107 个测试，全部通过 pytest 验证。

变更内容

新建文件

文件	测试数	覆盖内容
`test_aime2026.py`	4	Aime2026Dataset 单条/多条加载、空白处理、字段保留
`test_dapo_math.py`	31	boxed 函数、normalize_final_answer、extractor、后处理器、Evaluator/V2

增强文件

文件	新增	覆盖路径
`test_aime_datasets.py`	+4	多记录加载、字段映射、Aime2025JDGDataset._get_dataset_class
`test_gsm8k.py`	+4	get_action 找到/未找到、agent score 各分支
`test_gpqa.py`	+7	评分长度不匹配、全对/全错、details 结构、后处理器、多行洗牌
`test_realworldqa.py`	+5	无空格冒号、多冒号、tab、全 dict/混合 references
`test_math.py`	+52	多 boxed、fbox 异常、normalize 分支、strip_string v1/v2 各路径、is_equiv v2、Agent 评分路径

测试结果

关联 Issue

提升核心数据集（AIME、RealworldQA、MATH、GSM8K、GPQA、DAPO-Math）的单元测试覆盖率。

…8k, gpqa, dapo_math) - test_aime2026.py: new file with 4 tests for Aime2026Dataset - test_dapo_math.py: new file with 31 tests covering boxed functions, normalization, extractors, postprocessors and evaluators - test_aime_datasets.py: +4 tests (multi-record, field mapping, JDG class) - test_gsm8k.py: +4 tests (get_action, agent score paths) - test_gpqa.py: +7 tests (evaluator edge cases, postprocess, multi-row) - test_realworldqa.py: +5 tests (colon variants, tabs, dict refs) - test_math.py: +52 tests (boxed edge cases, normalize branches, strip_string v1/v2 paths, is_equiv versions, agent evaluator paths) Total: ~107 new tests, all passing pytest.

gemini-code-assist

Code Review

This pull request significantly expands the unit test coverage for various dataset loaders, postprocessors, and evaluators, including AIME (2024, 2025, 2026), DAPO Math, GPQA, GSM8K, MATH, and RealworldQA. The review feedback focuses on improving test quality and consistency: it suggests using new_callable=mock_open directly in @patch decorators to simplify file mocking, avoiding __new__ to bypass initialization in tests, inheriting from unittest.TestCase in test_dapo_math.py for consistency, and moving inline imports of AISBenchDataContentError to the top of the files.

gemini-code-assist · 2026-05-30T09:21:30Z

+    @patch("builtins.open")
+    def test_aime2026_load(self, mock_open_file, mock_get_path):
+        line = '{"question": "What is 1+1?", "answer": "2"}'
+        m = mock_open(read_data=line + "\n")
+        mock_open_file.return_value = m.return_value


Using new_callable=mock_open directly in the @patch decorator is cleaner and avoids manual assignment of mock_open_file.return_value = m.return_value.

Suggested change

@patch("builtins.open")

def test_aime2026_load(self, mock_open_file, mock_get_path):

line = '{"question": "What is 1+1?", "answer": "2"}'

m = mock_open(read_data=line + "\n")

mock_open_file.return_value = m.return_value

@patch("builtins.open", new_callable=mock_open, read_data='{"question": "What is 1+1?", "answer": "2"}\n')

def test_aime2026_load(self, mock_open_file, mock_get_path):

gemini-code-assist · 2026-05-30T09:21:30Z

+    @patch("builtins.open")
+    def test_aime2024_multiple_records(self, mock_open_file, mock_get_path):
+        line1 = '{"origin_prompt": "Q1?", "gold_answer": "1"}'
+        line2 = '{"origin_prompt": "Q2?", "gold_answer": "2"}'
+        m = mock_open(read_data=line1 + "\n" + line2 + "\n")
+        mock_open_file.return_value = m.return_value


Using new_callable=mock_open directly in the @patch decorator is cleaner and avoids manual assignment of mock_open_file.return_value = m.return_value.

Suggested change

@patch("builtins.open")

def test_aime2024_multiple_records(self, mock_open_file, mock_get_path):

line1 = '{"origin_prompt": "Q1?", "gold_answer": "1"}'

line2 = '{"origin_prompt": "Q2?", "gold_answer": "2"}'

m = mock_open(read_data=line1 + "\n" + line2 + "\n")

mock_open_file.return_value = m.return_value

@patch("builtins.open", new_callable=mock_open, read_data='{"origin_prompt": "Q1?", "gold_answer": "1"}\n{"origin_prompt": "Q2?", "gold_answer": "2"}\n')

def test_aime2024_multiple_records(self, mock_open_file, mock_get_path):

gemini-code-assist · 2026-05-30T09:21:30Z

+    def test_aime2025_jdg_dataset_class(self):
+        instance = Aime2025JDGDataset.__new__(Aime2025JDGDataset)
+        result = instance._get_dataset_class()
+        self.assertIs(result, Aime2025Dataset)


Using __new__ to bypass __init__ is generally considered a code smell in unit tests. Since _get_dataset_class is a method that does not access self, you can call it directly on the class by passing None as self.

Suggested change

def test_aime2025_jdg_dataset_class(self):

instance = Aime2025JDGDataset.__new__(Aime2025JDGDataset)

result = instance._get_dataset_class()

self.assertIs(result, Aime2025Dataset)

def test_aime2025_jdg_dataset_class(self):

result = Aime2025JDGDataset._get_dataset_class(None)

self.assertIs(result, Aime2025Dataset)

gemini-code-assist · 2026-05-30T09:21:30Z

+from ais_bench.benchmark.utils.logging.exceptions import AISBenchDataContentError
+
+
+class TestBoxedFunctions:


For consistency with the rest of the test suite (which all use unittest.TestCase), consider inheriting from unittest.TestCase and using self.assert* methods instead of plain assert statements.

gemini-code-assist · 2026-05-30T09:21:30Z

+    @patch("builtins.open")
+    def test_dataset_multiple_rows(self, mock_open_file, mock_get_path):
+        content = (
+            "h0,h1,h2,h3,h4,h5,h6,Question,A,B,C,D\n"
+            ",,,,,,,Q1,opt1_a,opt1_b,opt1_c,opt1_d\n"
+            ",,,,,,,Q2,opt2_a,opt2_b,opt2_c,opt2_d\n"
+        )
+        m = mock_open(read_data=content)
+        mock_open_file.return_value = m.return_value


Using new_callable=mock_open directly in the @patch decorator is cleaner and avoids manual assignment of mock_open_file.return_value = m.return_value.

Suggested change

@patch("builtins.open")

def test_dataset_multiple_rows(self, mock_open_file, mock_get_path):

content = (

"h0,h1,h2,h3,h4,h5,h6,Question,A,B,C,D\n"

",,,,,,,Q1,opt1_a,opt1_b,opt1_c,opt1_d\n"

",,,,,,,Q2,opt2_a,opt2_b,opt2_c,opt2_d\n"

)

m = mock_open(read_data=content)

mock_open_file.return_value = m.return_value

@patch("builtins.open", new_callable=mock_open, read_data="h0,h1,h2,h3,h4,h5,h6,Question,A,B,C,D\n,,,,,,,Q1,opt1_a,opt1_b,opt1_c,opt1_d\n,,,,,,,Q2,opt2_a,opt2_b,opt2_c,opt2_d\n")

def test_dataset_multiple_rows(self, mock_open_file, mock_get_path):

gemini-code-assist · 2026-05-30T09:21:30Z

+    def test_extract_boxed_answer_fbox(self):
+        from ais_bench.benchmark.utils.logging.exceptions import AISBenchDataContentError
+        text = "The answer is \\fbox{99}"


Please move the import of AISBenchDataContentError to the top of the file to avoid redundant inline imports across multiple test methods.

Suggested change

def test_extract_boxed_answer_fbox(self):

from ais_bench.benchmark.utils.logging.exceptions import AISBenchDataContentError

text = "The answer is \\fbox{99}"

def test_extract_boxed_answer_fbox(self):

text = "The answer is \\fbox{99}"

gemini-code-assist · 2026-05-30T09:21:30Z

+    def test_remove_right_units_multiple_splits(self):
+        from ais_bench.benchmark.utils.logging.exceptions import AISBenchDataContentError
+        evaluator = MATHEvaluator()


Please move the import of AISBenchDataContentError to the top of the file to avoid redundant inline imports across multiple test methods.

Suggested change

def test_remove_right_units_multiple_splits(self):

from ais_bench.benchmark.utils.logging.exceptions import AISBenchDataContentError

evaluator = MATHEvaluator()

def test_remove_right_units_multiple_splits(self):

evaluator = MATHEvaluator()

gemini-code-assist Bot reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Test-Cases】: add UT coverage for core datasets (aime, realworldqa, math, gsm8k, gpqa, dapo_math)#315

【Test-Cases】: add UT coverage for core datasets (aime, realworldqa, math, gsm8k, gpqa, dapo_math)#315
wanlongze wants to merge 1 commit into
AISBench:masterfrom
wanlongze:feat/add-core-dataset-ut-coverage

wanlongze commented May 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		from ais_bench.benchmark.utils.logging.exceptions import AISBenchDataContentError


		class TestBoxedFunctions:

Conversation

wanlongze commented May 30, 2026

概述

变更内容

新建文件

增强文件

测试结果

关联 Issue

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant