A minimal fork of SWE-agent/mini-swe-agent, customized for local evaluation with vLLM.
Miniforge is a lightweight Conda installer.
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).shSee the official Miniforge repo for more details.
git clone https://github.com/stovecat/mini-swe-agent.git
cd mini-swe-agent# modify `prefix` accordingly to reflect your working directory.
vi environment.yml
conda env create -f environment.yml
conda activate sweInstall and configure the SWE-bench CLI:
pip install sb-cli
sb-cli gen-api-key your.email@example.comThen export your API key and verify it:
export SWEBENCH_API_KEY=your_api_key
sb-cli verify-api-key YOUR_VERIFICATION_CODESee SWE-bench CLI docs for reference.
Edit your ~/.bashrc (or ~/.zshrc) and add:
export HF_HOME=[YOUR_CACHE_PATH]/.cache/huggingface
export SWEBENCH_API_KEY=[YOUR_SWE_API_KEY]source ~/.bashrc
conda activate sweRun the model initialization script (modify variables as needed):
vi scripts/init_run_vllm_model.shRequired environment variables to modify:
cuda devicemax-model-lentensor-parallel-sizeport numberMODEL_NAME(your vLLM service name)HF_MODEL_NAME(full Hugging Face model name, including org prefix)
The working directory should be set so that
HF_HOMEin the docker is visible, e.g., ifHF_HOME=/mnt/sda/hojae/.cache/huggingfacein local,... -v /mnt/sda/hojae:/workspace \ -e HF_HOME=/workspace/.cache/huggingface ...
If you are using OpenAI models like gpt-oss-120b, ensure that
tiktoken_cacheis stored under.cache.
bash scripts/init_run_vllm_model.shWhen the model loads successfully, stop it and create your run script (e.g. based on scripts/run_vllm_gpt-oss-120b.sh).
⚠️ Important: UseHF_HUB_OFFLINE=1(local mode) to ensure greedy decoding and temperature settings apply correctly.
Your
MODEL_NAMEmust have a corresponding entry insrc/minisweagent/config/model_prices_and_context_window.json.
cp scripts/run_vllm_gpt-oss-120b.sh scripts/run_vllm_[MODEL_NAME].sh
vi scripts/run_vllm_[MODEL_NAME].sh
bash scripts/run_vllm_[MODEL_NAME].shUse src/minisweagent/config/extra/vllm_gpt-oss-120b_swebench.yaml as a template.
Set:
model_name: [MODEL_NAME]
litellm_model_registry: [ABSOLUTE_PATH_TO `model_prices_and_context_window.json`]
port: [SAME_PORT_NUMBER_AS_IN_vLLM_LAUNCH_SCRIPT]Edit and execute the evaluation script:
# Modify BASE_DIR and MODEL_NAME
vi scripts/eval_vllm_swebench.sh
bash scripts/eval_vllm_swebench.sh-
MODEL_NAMEmust match across:- vLLM launch script
- YAML configuration
- Evaluation script
model_prices_and_context_window.json
| Model | % Resolved | Version |
|---|---|---|
| Reported | ||
| gpt-oss-120b | 26.00 | 1.7.0 |
| Llama-4-Scout-17B-16E | 9.06 | 0.0.0 |
| Qwen2.5-Coder-32B-Instruct | 9.00 | 1.0.0 |
| Reproduced | ||
| gpt-oss-120b | 30.00 | 1.14.2 |
| Qwen3-30B-A3B-Instruct-2507 | 11.00 | 1.14.2 |