TREC Deep Learning Track 2023
- qid: Query ID
- Query Length (QL): Indicate if query is long (1, no. of words > 10) or short (0, num. of words <= 10)
- Query Difficulty Real (QDR): Qeury difficulty for real query
- Query Difficulty Synthetic (QDS): Qeury difficulty for synthetic query
- Query Word (QW): Number of words in the query -- indicating query length
- Document Length (DL): Average passages length for each query based on the qrels
- Synthetic: 1 if query is synthetic (T5 or GPT-4 generated quereis)
- isGPT4: it is 1 if the query is GPT4-generated
- ST: System Type
- isLLM: referes to if the pipeline contains an LLM in its model
- This factor is highly correlated with LLM or model type factor and should not be considered.
- MN: No. of Model Variants, referes to the number of different models in the proposed pipeline (e.g., BM25 for retriveal, GPT-4 for ranking)
- PW: Passage Lenght: The number of tokens/words in a passage
- creating-files.ipynb: To create factors data for linear mixed-effect model analysis.
- query-analysis.ipynb: To analyse the queries characteristics
- judgement-analysis.ipynb: To create Bland-Altman plot
- labels-analysis.ipynb: To analyse the judgements distrubutions based on the level of judge and the source
- mixed-effect-analysis.ipynb: To run linear mixed-effect model analysis
- synthetic-qrel-analysis.ipynb: To run system ranking analysis
@inproceedings{rahmani2025towards,
title={Towards Understanding Bias in Synthetic Data for Evaluation},
author={Rahmani, Hossein A and Ramineni, Varsha and Yilmaz, Emine and Craswell, Nick and Mitra, Bhaskar},
booktitle={Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
pages={5166--5170},
year={2025}
}