I am attempting to reproduce the results presented in the paper, but I obtained different results than those reported. To better understand potential sources of discrepancy, could you kindly provide details on the hardware used for the experiments, specifically the GPU model and the number of devices? Additionally, should I use the same batch size as specified in the configuration file?