Dataset validation

Dear author:

I am currently conducting mathematical reasoning experiments using two basic models, qwen2.5-math-1.5b/7b. I did not see any data on ACM23 AIME24 in the official technical report. Was the data used in your paper in January this year the result of your own testing?