Score identity Distillation (SiD) is a data-free method that has achieved state-of-the-art performance in image generation by leveraging only a pretrained diffusion model, without requiring any training data. However, the ultimate performance of SiD is constrained by the accuracy with which the pretrained model captures the true data scores at different stages of the diffusion process. In this paper, we introduce SiDA (SiD with Adversarial Loss), which not only enhances generation quality but also improves distillation efficiency by incorporating real images and adversarial loss. SiDA utilizes the encoder from the generator's score network as a discriminator, boosting its ability to distinguish between real images and those generated by SiD. The adversarial loss is batch-normalized within each GPU and then combined with the original SiD loss. This integration effectively incorporates the average "fakeness" per GPU batch into the pixel-based SiD loss, enabling SiDA to distill a single-step generator either from scratch or by fine-tuning an existing one. SiDA converges significantly faster than its predecessor when trained from scratch, and swiftly improves upon the original model's performance after an initial warmup period during fine-tuning from a pre-distilled SiD generator. This one-step adversarial distillation method establishes new benchmarks in generation performance when distilling EDM diffusion models pretrained on CIFAR-10 (32x32) and ImageNet (64x64), achieving FID score of 1.110 on ImageNet 64x64. It sets record-low FID scores when distilling EDM2 models trained on ImageNet (512x512), surpassing even the largest teacher model, EDM2-XXL. Our SiDA's results record FID scores of 2.156 for EDM2-XS, 1.669 for EDM2-S, 1.488 for EDM2-M, and 1.465 for EDM2-L, demonstrating significant improvements across all model sizes. Our open-source code will be integrated into the SiD codebase.
翻译:分数恒等蒸馏(SiD)是一种无需数据的方法,仅利用预训练的扩散模型(无需任何训练数据)就在图像生成领域取得了最先进的性能。然而,SiD的最终性能受限于预训练模型在扩散过程不同阶段捕捉真实数据分数(score)的准确性。本文提出了SiDA(引入对抗损失的SiD),该方法通过引入真实图像和对抗损失,不仅提升了生成质量,还提高了蒸馏效率。SiDA利用生成器分数网络中的编码器作为判别器,增强了其区分真实图像与SiD生成图像的能力。对抗损失在每个GPU内进行批归一化,然后与原始的SiD损失相结合。这种整合有效地将每个GPU批次的平均“虚假度”融入基于像素的SiD损失中,使得SiDA能够从头开始蒸馏一个单步生成器,或者对现有生成器进行微调。当从头开始训练时,SiDA的收敛速度明显快于其前身;当从预蒸馏的SiD生成器进行微调时,经过初始预热期后,SiDA能迅速提升原始模型的性能。这种一步对抗蒸馏方法在蒸馏基于CIFAR-10(32x32)和ImageNet(64x64)预训练的EDM扩散模型时,创造了生成性能的新基准,在ImageNet 64x64上实现了1.110的FID分数。在蒸馏基于ImageNet(512x512)训练的EDM2模型时,它创下了最低的FID分数记录,甚至超越了最大的教师模型EDM2-XXL。我们的SiDA结果为EDM2-XS、EDM2-S、EDM2-M和EDM2-L分别记录了2.156、1.669、1.488和1.465的FID分数,表明在所有模型尺寸上均有显著提升。我们的开源代码将被集成到SiD代码库中。