Diffusion models (DMs) excel in image generation but suffer from slow inference and training-inference discrepancies. Although gradient-based solvers for DMs accelerate denoising inference, they often lack theoretical foundations in information transmission efficiency. In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. This principle leads to our key insights into the inference processes: (1) data prediction parameterization outperforms its noise counterpart, and (2) optimizing conditional variance offers a reference-free way to minimize both transition and reconstruction errors. Based on these insights, we propose an entropy-aware variance optimized method for the generative process of DMs, called EVODiff, which systematically reduces uncertainty by optimizing conditional entropy during denoising. Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45.5\% (FID improves from 5.10 to 2.78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25\% (from 20 to 15 NFE) for high-quality samples on ImageNet-256, and improves text-to-image generation while reducing artifacts. Code is available at https://github.com/ShiguiLi/EVODiff.
翻译:扩散模型在图像生成方面表现出色,但其推理速度缓慢且存在训练-推理差异。尽管基于梯度的扩散模型求解器加速了去噪推理过程,但它们在信息传输效率方面往往缺乏理论基础。本研究从信息论视角审视扩散模型的推理过程,揭示成功的去噪本质上是在逆向转移中降低条件熵。这一原理引出了我们对推理过程的关键洞见:(1)数据预测参数化优于噪声预测参数化;(2)优化条件方差提供了一种无需参考即可最小化转移误差与重建误差的方法。基于这些洞见,我们提出了一种用于扩散模型生成过程的熵感知方差优化方法,称为EVODiff。该方法通过在去噪过程中优化条件熵,系统地降低不确定性。在扩散模型上的大量实验验证了我们的观点,并表明所提方法显著且一致地优于当前最先进的基于梯度求解器。例如,在CIFAR-10数据集上,与DPM-Solver++相比,EVODiff在10次函数评估(NFE)下将重建误差降低了45.5%(FID从5.10提升至2.78);在ImageNet-256数据集上,为获得高质量样本将NFE成本降低了25%(从20次NFE降至15次NFE);同时改善了文本到图像生成效果并减少了伪影。代码发布于https://github.com/ShiguiLi/EVODiff。