Bias Evaluation in Synthetic Test Evaluation

Experimental Setup

Dataset

TREC Deep Learning Track 2023

Factors

Query Level: 'infos/query_to_info.txt'

qid: Query ID
Query Length (QL): Indicate if query is long (1, no. of words > 10) or short (0, num. of words <= 10)
Query Difficulty Real (QDR): Qeury difficulty for real query
Query Difficulty Synthetic (QDS): Qeury difficulty for synthetic query
Query Word (QW): Number of words in the query -- indicating query length
Document Length (DL): Average passages length for each query based on the qrels
Synthetic: 1 if query is synthetic (T5 or GPT-4 generated quereis)
isGPT4: it is 1 if the query is GPT4-generated

Model Level: 'infos/model_to_info.txt'

ST: System Type
isLLM: referes to if the pipeline contains an LLM in its model
- This factor is highly correlated with LLM or model type factor and should not be considered.
MN: No. of Model Variants, referes to the number of different models in the proposed pipeline (e.g., BM25 for retriveal, GPT-4 for ranking)

Passage Level: 'infos/pass_to_info.txt'

PW: Passage Lenght: The number of tokens/words in a passage

Notebooks

creating-files.ipynb: To create factors data for linear mixed-effect model analysis.
query-analysis.ipynb: To analyse the queries characteristics
judgement-analysis.ipynb: To create Bland-Altman plot
labels-analysis.ipynb: To analyse the judgements distrubutions based on the level of judge and the source
mixed-effect-analysis.ipynb: To run linear mixed-effect model analysis
synthetic-qrel-analysis.ipynb: To run system ranking analysis

Cite

@inproceedings{rahmani2025towards,
  title={Towards Understanding Bias in Synthetic Data for Evaluation},
  author={Rahmani, Hossein A and Ramineni, Varsha and Yilmaz, Emine and Craswell, Nick and Mitra, Bhaskar},
  booktitle={Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
  pages={5166--5170},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bias Evaluation in Synthetic Test Evaluation

Experimental Setup

Dataset

Factors

Query Level: 'infos/query_to_info.txt'

Model Level: 'infos/model_to_info.txt'

Passage Level: 'infos/pass_to_info.txt'

Notebooks

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
figs		figs
infos		infos
qrels		qrels
queries		queries
results/TRECDL2023		results/TRECDL2023
.gitignore		.gitignore
README.md		README.md
creating-files.ipynb		creating-files.ipynb
judgement-analysis.ipynb		judgement-analysis.ipynb
labels-analysis.ipynb		labels-analysis.ipynb
metadata		metadata
metadata_models.csv		metadata_models.csv
mixed-effect-analysis.ipynb		mixed-effect-analysis.ipynb
query-analysis.ipynb		query-analysis.ipynb
synthetic-qrel-analysis.ipynb		synthetic-qrel-analysis.ipynb

Folders and files

Latest commit

History

Repository files navigation

Bias Evaluation in Synthetic Test Evaluation

Experimental Setup

Dataset

Factors

Query Level: 'infos/query_to_info.txt'

Model Level: 'infos/model_to_info.txt'

Passage Level: 'infos/pass_to_info.txt'

Notebooks

Cite

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages