Skip to content

Multi GPU Training Issue #1

@ShenZheng2000

Description

@ShenZheng2000

Hello, authors! Thanks for your excellent work.

I have trouble with multi-GPU training. My command line looks like this:

python train.py --dataroot $dataset_path--name $model_name--gpu 0,1,2,3 --batch_size 1

And the error is below:

Traceback (most recent call last):
  File "/home/shen/Rain/Methods/Decent/train.py", line 49, in <module>
    model.data_dependent_initialize(data)
  File "/home/shen/Rain/Methods/Decent/models/decent_gan_model.py", line 99, in data_dependent_initialize
    self.compute_F_loss().backward()                   # calculate graidents for F
  File "/home/shen/Rain/Methods/Decent/models/decent_gan_model.py", line 189, in compute_F_loss
    assert len(log_prob_a) == self.opt.batch_size * self.opt.num_patches
AssertionError

I print the values below for debugging.

print(f"{len(log_prob_a)} != {self.opt.batch_size} * {self.opt.num_patches}")

which gives me

0 ! = 1 * 256

Since len(log_prob_a) is 0, we get an empty list for log_prob_a in multi-GPU training.

Do you encounter this issue when training your models? How to solve this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions