Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques; however, it continues to face challenges such as mode collapse, training instability, and low-quality output in cases of datasets with high intra-class variation. Furthermore, most GANs often converge in larger iterations, resulting in poor iteration efficacy in training procedures. While Diffusion-GAN has shown potential in generating realistic samples, it has a critical limitation in generating class-conditional samples. To overcome these limitations, we propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process. Our method consists of three unique networks: a discriminator, a generator, and a classifier. During the training process, Gaussian-mixture noises are injected into the two noise-aware networks, the discriminator and the classifier, in distinct ways. This noisy data helps to prevent overfitting by gradually introducing more challenging tasks, leading to improved model performance. As a result, our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance. We evaluated our method using the AFHQ, Food-101, and CIFAR-10 datasets and observed superior results across metrics such as FID, KID, Precision, and Recall score compared with comparison models, highlighting the effectiveness of our approach.
翻译:利用生成对抗网络进行类别条件图像生成已通过多种技术得到研究,但在处理类内差异较大的数据集时,仍面临模式崩溃、训练不稳定以及输出质量低等挑战。此外,多数生成对抗网络通常需要大量迭代才能收敛,导致训练过程中迭代效率低下。尽管扩散生成对抗网络在生成逼真样本方面展现出潜力,但在生成类别条件样本方面存在关键局限性。为克服这些限制,我们提出了一种基于生成对抗网络的类别条件图像生成新方法——DuDGAN,该方法融合了基于双扩散的噪声注入过程。我们的方法包含三个独特网络:判别器、生成器和分类器。训练过程中,将高斯混合噪声以不同方式注入两个感知噪声的网络(判别器和分类器)。这种含噪数据通过逐步引入更具挑战性的任务,有助于防止过拟合,从而提升模型性能。因此,我们的方法在图像生成性能上超越了当前最先进的类别条件生成对抗网络模型。我们使用AFHQ、Food-101和CIFAR-10数据集对方法进行了评估,观察到在FID、KID、精确率与召回率等指标上均优于对比模型,凸显了我们方法的有效性。