Hello! I've read the Training with Limited Data paper (ADA - Adaptive Discriminator Augmentation), which was mentioned in this EDM paper that it helped with performance. In the ADA paper, the augmentation was applied to discriminators of GAN, so as to prevent the generator from producing augmented data.
I was wondering how the EDM/diffusion model learns in general not to produce augmented data in this case?
Hello! I've read the Training with Limited Data paper (ADA - Adaptive Discriminator Augmentation), which was mentioned in this EDM paper that it helped with performance. In the ADA paper, the augmentation was applied to discriminators of GAN, so as to prevent the generator from producing augmented data.
I was wondering how the EDM/diffusion model learns in general not to produce augmented data in this case?