Hi, thanks for your inspiring work! According to the MAE repo and its pre-training guidelines, it should have a large batch size, indicating that it may require many GPUs for distributed training. Therefore I am curious about the training computation overhead. How much time did you use to pre-train a ViT? What GPUs did you use and how many? Thanks in advance for your response.
Hi, thanks for your inspiring work! According to the MAE repo and its pre-training guidelines, it should have a large batch size, indicating that it may require many GPUs for distributed training. Therefore I am curious about the training computation overhead. How much time did you use to pre-train a ViT? What GPUs did you use and how many? Thanks in advance for your response.