操作系统及版本
openEuler 24.03
安装工具的python环境
docker容器中的python环境
python版本
3.11
AISBench工具版本
Name: ais_bench_benchmark Version: 3.1.20260508
AISBench执行命令
ais_bench ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified_mini.py --num-prompts 1 --debug
模型配置文件或自定义配置文件内容
from ais_bench.benchmark.datasets import SWEBenchDataset
from ais_bench.benchmark.partitioners import NaivePartitioner
from ais_bench.benchmark.runners import LocalRunner
from ais_bench.benchmark.tasks import SWEBenchInferTask, SWEBenchEvalTask
from ais_bench.benchmark.summarizers import SWEBenchSummarizer
STEP_LIMIT = 500
models = [
dict(
attr="local",
abbr="swebench",
type="LiteLLMChat",
model="qwen3.6",
api_key="EMPTY",
url="http://xxxx/v1", # API base, e.g. http://127.0.0.1:8000/v1
batch_size=1,
generation_kwargs=dict(),
)
]
datasets = [
dict(
type=SWEBenchDataset,
abbr="swebench_verified_mini",
# Relative to AIS_BENCH_DATASETS_CACHE (default: project root); missing -> HF download
path="/home/zhouyue/swe_test_aisbench_2/swe-bench-verified-mini/data",
name="verified_mini",
split="test",
step_limit=STEP_LIMIT,
filter_spec="",
shuffle=False,
),
]
summarizer = dict(
attr="accuracy",
type=SWEBenchSummarizer,
)
infer = dict(
partitioner=dict(type=NaivePartitioner),
runner=dict(
type=LocalRunner,
task=dict(type=SWEBenchInferTask),
),
)
eval = dict(
partitioner=dict(type=NaivePartitioner),
runner=dict(
type=LocalRunner,
task=dict(type=SWEBenchEvalTask),
),
)
预期行为
正常测试
实际行为
生成的代码补丁为空

前置检查
操作系统及版本
openEuler 24.03
安装工具的python环境
docker容器中的python环境
python版本
3.11
AISBench工具版本
Name: ais_bench_benchmark Version: 3.1.20260508
AISBench执行命令
ais_bench ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified_mini.py --num-prompts 1 --debug
模型配置文件或自定义配置文件内容
from ais_bench.benchmark.datasets import SWEBenchDataset
from ais_bench.benchmark.partitioners import NaivePartitioner
from ais_bench.benchmark.runners import LocalRunner
from ais_bench.benchmark.tasks import SWEBenchInferTask, SWEBenchEvalTask
from ais_bench.benchmark.summarizers import SWEBenchSummarizer
STEP_LIMIT = 500
models = [
dict(
attr="local",
abbr="swebench",
type="LiteLLMChat",
model="qwen3.6",
api_key="EMPTY",
url="http://xxxx/v1", # API base, e.g. http://127.0.0.1:8000/v1
batch_size=1,
generation_kwargs=dict(),
)
]
datasets = [
dict(
type=SWEBenchDataset,
abbr="swebench_verified_mini",
# Relative to AIS_BENCH_DATASETS_CACHE (default: project root); missing -> HF download
path="/home/zhouyue/swe_test_aisbench_2/swe-bench-verified-mini/data",
name="verified_mini",
split="test",
step_limit=STEP_LIMIT,
filter_spec="",
shuffle=False,
),
]
summarizer = dict(
attr="accuracy",
type=SWEBenchSummarizer,
)
infer = dict(
partitioner=dict(type=NaivePartitioner),
runner=dict(
type=LocalRunner,
task=dict(type=SWEBenchInferTask),
),
)
eval = dict(
partitioner=dict(type=NaivePartitioner),
runner=dict(
type=LocalRunner,
task=dict(type=SWEBenchEvalTask),
),
)
预期行为
正常测试
实际行为
生成的代码补丁为空

前置检查