-
Notifications
You must be signed in to change notification settings - Fork 42
Pull requests: AISBench/benchmark
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
【Test-Cases】: add UT coverage for core datasets (aime, realworldqa, math, gsm8k, gpqa, dapo_math)
#315
opened May 30, 2026 by
wanlongze
Contributor
Loading…
[feature] Support run terminal-bench-2.0 with harbor
feature
#314
opened May 30, 2026 by
SJTUyh
Collaborator
Loading…
3 of 16 tasks
Fix vllm_api_general_chat endpoint join for base URLs without trailing slash
bugfix
test-cases
#313
opened May 29, 2026 by
Copilot
AI
Loading…
4 of 15 tasks
[bugfix]: fix answer extraction regex and evaluator bugs in MMLU-Pro
bugfix
#269
opened May 2, 2026 by
lvhua6352
Loading…
1 of 15 tasks
[Bugfix] Add pred and choices parsing to fix the issue of score=0 for…
bugfix
#258
opened Apr 20, 2026 by
Yanguan619
Loading…
1 of 15 tasks
Fix max_out_len handling for multi-turn ShareGPT conversations
bugfix
#243
opened Apr 12, 2026 by
Shadowless-ly
Loading…
6 of 15 tasks
Fix the issue where TTFT and TPOT have no data when running Kimi2.5 i…
#210
opened Mar 21, 2026 by
GaoHuaZhang
Collaborator
Loading…
15 tasks
update fix on textvqa, mmmu, mmstar, add patch for glm4.6v
bugfix
#188
opened Mar 13, 2026 by
Shane120283483
Loading…
1 of 15 tasks
[UT] Add new UT for Gedit feature
test-cases
#163
opened Mar 5, 2026 by
SJTUyh
Collaborator
Loading…
1 of 15 tasks
[feature] [sub feature 2] Dependency for qwen image edit run
feature
#151
opened Feb 13, 2026 by
SJTUyh
Collaborator
Loading…
1 of 15 tasks
[feature] [sub feature 3] Support qwen Image edit infer with gedit dataset
feature
#150
opened Feb 13, 2026 by
SJTUyh
Collaborator
Loading…
1 of 15 tasks
【TEST】补充math和agieval数据集的冒烟用例
test-cases
#145
opened Feb 11, 2026 by
GaoHuaZhang
Collaborator
Loading…
1 of 15 tasks
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.