Diffusion models achieve great success in generating diverse and high-fidelity images. The performance improvements come with low generation speed per image, which hinders the application diffusion models in real-time scenarios. While some certain predictions benefit from the full computation of the model in each sample iteration, not every iteration requires the same amount of computation, potentially leading to computation waste. In this work, we propose DeeDiff, an early exiting framework that adaptively allocates computation resources in each sampling step to improve the generation efficiency of diffusion models. Specifically, we introduce a timestep-aware uncertainty estimation module (UEM) for diffusion models which is attached to each intermediate layer to estimate the prediction uncertainty of each layer. The uncertainty is regarded as the signal to decide if the inference terminates. Moreover, we propose uncertainty-aware layer-wise loss to fill the performance gap between full models and early-exited models. With such loss strategy, our model is able to obtain comparable results as full-layer models. Extensive experiments of class-conditional, unconditional, and text-guided generation on several datasets show that our method achieves state-of-the-art performance and efficiency trade-off compared with existing early exiting methods on diffusion models. More importantly, our method even brings extra benefits to baseline models and obtains better performance on CIFAR-10 and Celeb-A datasets. Full code and model are released for reproduction.
翻译:扩散模型在生成多样化高保真图像方面取得了显著成功。性能提升伴随而来的是每张图像生成速度较低的问题,这阻碍了扩散模型在实时场景中的应用。尽管某些预测在每次样本迭代中受益于模型的完整计算,但并非每次迭代都需要相同的计算量,由此可能造成计算浪费。本文提出DeeDiff——一种自适应分配各采样步骤计算资源的早退框架,以提升扩散模型的生成效率。具体而言,我们为扩散模型设计了时间步感知的不确定性估计模块(UEM),该模块附着于每个中间层,用于估计各层的预测不确定性。不确定性被视作决定推理是否终止的信号。此外,我们提出不确定性感知的逐层损失函数,以弥合完整模型与早退模型之间的性能差距。借助该损失策略,我们的模型能够获得与全层模型相当的结果。在多个数据集上进行的类别条件生成、无条件生成和文本引导生成实验表明,与现有扩散模型早退方法相比,我们的方法实现了性能与效率的最优权衡。更重要的是,该方法甚至能为基线模型带来额外增益,并在CIFAR-10和Celeb-A数据集上获得更优性能。为便于复现,我们公开了完整代码和模型。