Hi,
We try to reproduce the results of OpenMath Llama (Table 3 in the paper), but the two eval tasks in the scripts (gsm8k_openmath2, math_500_openmath2) are not available from the lm_eval. Just wondering if you use a specific version of lm_eval or other evaluation tools for these datasets. Thank you!