Score identity Distillation (SiD) is a data-free method that has achieved SOTA performance in image generation by leveraging only a pretrained diffusion model, without requiring any training data. However, its ultimate performance is constrained by how accurate the pretrained model captures the true data scores at different stages of the diffusion process. In this paper, we introduce SiDA (SiD with Adversarial Loss), which not only enhances generation quality but also improves distillation efficiency by incorporating real images and adversarial loss. SiDA utilizes the encoder from the generator's score network as a discriminator, boosting its ability to distinguish between real images and those generated by SiD. The adversarial loss is batch-normalized within each GPU and then combined with the original SiD loss. This integration effectively incorporates the average "fakeness" per GPU batch into the pixel-based SiD loss, enabling SiDA to distill a single-step generator either from scratch or by fine-tuning an existing one. SiDA converges significantly faster than its predecessor when trained from scratch, and swiftly improves upon the original model's performance after an initial warmup period during fine-tuning from a pre-distilled SiD generator. This one-step adversarial distillation method establishes new benchmarks in generation performance when distilling EDM diffusion models pretrained on CIFAR-10 (32x32) and ImageNet (64x64), achieving FID score of 1.110 on ImageNet 64x64. It sets record-low FID scores when distilling EDM2 models trained on ImageNet (512x512), surpassing even the largest teacher model, EDM2-XXL. Our SiDA's results record FID scores of 2.156 for EDM2-XS, 1.669 for S, 1.488 for M, 1.413 for L, 1.379 for XL, and 1.366 for XXL, demonstrating significant improvements across all model sizes. Our open-source code will be integrated into the SiD codebase.
翻译:分数恒等蒸馏(SiD)是一种无需训练数据的先进图像生成方法,仅利用预训练扩散模型即可达到最优性能。然而,其最终性能受限于预训练模型在扩散过程不同阶段对真实数据分数估计的准确度。本文提出SiDA(引入对抗损失的SiD),通过结合真实图像与对抗损失,不仅提升了生成质量,还显著提高了蒸馏效率。SiDA利用生成器分数网络的编码器作为判别器,增强其区分真实图像与SiD生成图像的能力。对抗损失在每个GPU内进行批归一化处理,随后与原始SiD损失结合。这种集成方式将每GPU批次的平均“伪真度”有效融入基于像素的SiD损失中,使得SiDA能够从头开始蒸馏单步生成器,或对现有生成器进行微调。从头训练时,SiDA的收敛速度显著快于前代方法;当基于预蒸馏SiD生成器进行微调时,经过初始预热阶段后能快速超越原始模型性能。这种单步对抗蒸馏方法在蒸馏CIFAR-10(32×32)和ImageNet(64×64)预训练的EDM扩散模型时,创造了生成性能的新基准——在ImageNet 64×64数据集上获得1.110的FID分数。在蒸馏ImageNet(512×512)训练的EDM2模型时,该方法取得了创纪录的最低FID分数,甚至超越了最大的教师模型EDM2-XXL。我们的SiDA实验结果记录显示:EDM2-XS模型FID为2.156,S模型为1.669,M模型为1.488,L模型为1.413,XL模型为1.379,XXL模型为1.366,所有模型尺寸均实现显著提升。我们的开源代码将集成至SiD代码库。