Guidance in conditional diffusion generation is of great importance for sample quality and controllability. However, existing guidance schemes are to be desired. On one hand, mainstream methods such as classifier guidance and classifier-free guidance both require extra training with labeled data, which is time-consuming and unable to adapt to new conditions. On the other hand, training-free methods such as universal guidance, though more flexible, have yet to demonstrate comparable performance. In this work, through a comprehensive investigation into the design space, we show that it is possible to achieve significant performance improvements over existing guidance schemes by leveraging off-the-shelf classifiers in a training-free fashion, enjoying the best of both worlds. Employing calibration as a general guideline, we propose several pre-conditioning techniques to better exploit pretrained off-the-shelf classifiers for guiding diffusion generation. Extensive experiments on ImageNet validate our proposed method, showing that state-of-the-art diffusion models (DDPM, EDM, DiT) can be further improved (up to 20%) using off-the-shelf classifiers with barely any extra computational cost. With the proliferation of publicly available pretrained classifiers, our proposed approach has great potential and can be readily scaled up to text-to-image generation tasks. The code is available at https://github.com/AlexMaOLS/EluCD/tree/main.
翻译:条件扩散生成中的引导机制对样本质量和可控性至关重要。然而,现有引导方案仍存在不足。一方面,分类器引导和无分类器引导等主流方法均需使用标注数据进行额外训练,不仅耗时且难以适应新条件。另一方面,通用引导等无训练方法虽更具灵活性,但尚未展现同等性能。本文通过对设计空间的全面研究,证明无需额外训练即可利用现成分类器实现显著优于现有引导方案的性能提升,兼顾灵活性与高效性。以校准为通用准则,我们提出多种预调节技术以更好地利用预训练现成分类器引导扩散生成。在ImageNet上的大量实验验证了所提方法:通过现成分类器在几乎不增加额外计算成本的情况下,可将最先进的扩散模型(DDPM、EDM、DiT)性能提升高达20%。随着公开预训练分类器的广泛应用,所提方法潜力巨大,可轻松扩展至文本到图像生成任务。代码已开源:https://github.com/AlexMaOLS/EluCD/tree/main。