SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

Quick start • Repository • Cite • Contact

A framework for multiple-choice evaluation that mitigates selection bias by counterbalancing position and semantic preferences in language models.

Paper: SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models (https://arxiv.org/abs/2507.18182)
Core idea: use Inverse-Positioning (IP) to offset models’ positional biases and Semantic-Spread (SS) to spatially separate similar distractors, reducing guesswork.

✨ TL;DR

Position bias: models disproportionately select certain answer slots (e.g., first/last); IP offsets this by placing the true answer in a less-preferred position.
Semantic bias: models tend to choose semantically similar distractors when uncertain; SS identifies near-miss distractors and spreads them apart to prevent clustering.
General: jointly applying IP + SS yields a fairer multiple-choice benchmark for large language models.

🛠️ Quick start

All scripts are designed for ease of reproducibility; you should be able to run the benchmarks within a few minutes.

Clone & setup

# 1) clone
git clone https://github.com/WonjunJeong97/SCOPE.git
cd SCOPE

# 2) Python deps (3.10+)
python -m venv .venv && source .venv/bin/activate   # or: conda create -n scope python=3.10 -y && conda activate scope
pip install -r requirements.txt

Environment variables

cp .env.example .env
# Edit .env with any required API keys/tokens (e.g., OpenAI, HuggingFace) if your model requires them.

Jupyter notebooks

python -m pip install jupyter
jupyter lab
# Open notebooks under notebooks/ and run the first cells to verify your setup.

Quick smoke test (1–2 min)

Run the built-in test mode to verify your installation end to end:

bash scripts/run_evaluation.sh -t
# Optionally pin dataset/model (same test mode, just more explicit):
bash scripts/run_evaluation.sh -t -d csqa -m gpt-3.5-turbo

If it completes without errors, you’re ready to reproduce the paper.

Note: This assumes .env is set up and the fixed datasets exist at the paths in configs/default.yaml.

📁 Repository structure

SCOPE/
├─ configs/        # per-table/figure experiment configs (YAML)
├─ figures/        # static images for README/docs (pipeline, schematics)
├─ scripts/        # download / train / eval / run_all helpers
├─ src/            # core implementation (data, models, utils, train.py, etc.)
├─ notebooks/      # demo & reproduction notebooks
├─ requirements.txt
├─ .env.example    # environment variable template
└─ README.md

📚 Citation

If this repository or the SCOPE framework helps your research, please cite:

@article{jeong2025scope,
  title   = {SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models},
  author  = {Jeong, Wonjun and Kim, Dongseok and Whangbo, Taegkeun},
  journal = {arXiv preprint arXiv:2507.18182},
  year    = {2025}
}

You may also cite the code base itself (optional):

@misc{scope_code_2025,
  title        = {SCOPE Codebase},
  author       = {Jeong, Wonjun and Kim, Dongseok and Whangbo, Taegkeun},
  howpublished = {\url{https://github.com/WonjunJeong97/SCOPE}},
  year         = {2025}
}

🤝 Contact

Maintainer: Wonjun Jeong (tp04045@gachon.ac.kr)
Questions & issues: please open a GitHub Issue in this repository.

📝 License

This project is released under the terms of the license in LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

✨ TL;DR

🛠️ Quick start

Clone & setup

Environment variables

Jupyter notebooks

Quick smoke test (1–2 min)

📁 Repository structure

📚 Citation

🤝 Contact

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
configs		configs
figures		figures
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

✨ TL;DR

🛠️ Quick start

Clone & setup

Environment variables

Jupyter notebooks

Quick smoke test (1–2 min)

📁 Repository structure

📚 Citation

🤝 Contact

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages