Hello,
It seems that the dataloader is not adapted to distributed setting (Line 881 at train.py).
The data entries will be repeatedly loaded and trained by different processes.
Maybe a sampler should be added, code as below:
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
shuffle=(args.split=='train'),
collate_fn=collate_fn,
batch_size=args.train_batch_size,
num_workers=args.dataloader_num_workers,
drop_last=True,
sampler=torch.utils.data.distributed.DistributedSampler(train_dataset),
)