Skip to content

WonjunJeong97/SCOPE

Repository files navigation

arXiv: 2507.18182 CI Python 3.10+ License

SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

Quick startRepositoryCiteContact

A framework for multiple-choice evaluation that mitigates selection bias by counterbalancing position and semantic preferences in language models.

  • Paper: SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models (https://arxiv.org/abs/2507.18182)
  • Core idea: use Inverse-Positioning (IP) to offset models’ positional biases and Semantic-Spread (SS) to spatially separate similar distractors, reducing guesswork.

✨ TL;DR

  • Position bias: models disproportionately select certain answer slots (e.g., first/last); IP offsets this by placing the true answer in a less-preferred position.
  • Semantic bias: models tend to choose semantically similar distractors when uncertain; SS identifies near-miss distractors and spreads them apart to prevent clustering.
  • General: jointly applying IP + SS yields a fairer multiple-choice benchmark for large language models.

SCOPE Pipeline (IP + SS)


🛠️ Quick start

All scripts are designed for ease of reproducibility; you should be able to run the benchmarks within a few minutes.

Clone & setup

# 1) clone
git clone https://github.com/WonjunJeong97/SCOPE.git
cd SCOPE

# 2) Python deps (3.10+)
python -m venv .venv && source .venv/bin/activate   # or: conda create -n scope python=3.10 -y && conda activate scope
pip install -r requirements.txt

Environment variables

cp .env.example .env
# Edit .env with any required API keys/tokens (e.g., OpenAI, HuggingFace) if your model requires them.

Jupyter notebooks

python -m pip install jupyter
jupyter lab
# Open notebooks under notebooks/ and run the first cells to verify your setup.

Quick smoke test (1–2 min)

Run the built-in test mode to verify your installation end to end:

bash scripts/run_evaluation.sh -t
# Optionally pin dataset/model (same test mode, just more explicit):
bash scripts/run_evaluation.sh -t -d csqa -m gpt-3.5-turbo

If it completes without errors, you’re ready to reproduce the paper.

Note: This assumes .env is set up and the fixed datasets exist at the paths in configs/default.yaml.


📁 Repository structure

SCOPE/
├─ configs/        # per-table/figure experiment configs (YAML)
├─ figures/        # static images for README/docs (pipeline, schematics)
├─ scripts/        # download / train / eval / run_all helpers
├─ src/            # core implementation (data, models, utils, train.py, etc.)
├─ notebooks/      # demo & reproduction notebooks
├─ requirements.txt
├─ .env.example    # environment variable template
└─ README.md

📚 Citation

If this repository or the SCOPE framework helps your research, please cite:

@article{jeong2025scope,
  title   = {SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models},
  author  = {Jeong, Wonjun and Kim, Dongseok and Whangbo, Taegkeun},
  journal = {arXiv preprint arXiv:2507.18182},
  year    = {2025}
}

You may also cite the code base itself (optional):

@misc{scope_code_2025,
  title        = {SCOPE Codebase},
  author       = {Jeong, Wonjun and Kim, Dongseok and Whangbo, Taegkeun},
  howpublished = {\url{https://github.com/WonjunJeong97/SCOPE}},
  year         = {2025}
}

🤝 Contact

  • Maintainer: Wonjun Jeong (tp04045@gachon.ac.kr)
  • Questions & issues: please open a GitHub Issue in this repository.

📝 License

This project is released under the terms of the license in LICENSE.

About

reproducibility statement of SCOPE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors