Skip to content

Potential Issue on data loader in distributed setting. #15

@dingyuan-shi

Description

@dingyuan-shi

Hello,
It seems that the dataloader is not adapted to distributed setting (Line 881 at train.py).
The data entries will be repeatedly loaded and trained by different processes.
Maybe a sampler should be added, code as below:

train_dataloader = torch.utils.data.DataLoader(
        train_dataset,
        shuffle=(args.split=='train'),
        collate_fn=collate_fn,
        batch_size=args.train_batch_size,
        num_workers=args.dataloader_num_workers,
        drop_last=True, 
        sampler=torch.utils.data.distributed.DistributedSampler(train_dataset), 
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions