The field of image generation has made significant progress thanks to the introduction of Diffusion Models, which learn to progressively reverse a given image corruption. Recently, a few studies introduced alternative ways of corrupting images in Diffusion Models, with an emphasis on blurring. However, these studies are purely empirical and it remains unclear what is the optimal procedure for corrupting an image. In this work, we hypothesize that the optimal procedure minimizes the length of the path taken when corrupting an image towards a given final state. We propose the Fisher metric for the path length, measured in the space of probability distributions. We compute the shortest path according to this metric, and we show that it corresponds to a combination of image sharpening, rather than blurring, and noise deblurring. While the corruption was chosen arbitrarily in previous work, our Shortest Path Diffusion (SPD) determines uniquely the entire spatiotemporal structure of the corruption. We show that SPD improves on strong baselines without any hyperparameter tuning, and outperforms all previous Diffusion Models based on image blurring. Furthermore, any small deviation from the shortest path leads to worse performance, suggesting that SPD provides the optimal procedure to corrupt images. Our work sheds new light on observations made in recent works and provides a new approach to improve diffusion models on images and other types of data.
翻译:图像生成领域得益于扩散模型的引入而取得了显著进展,这类模型通过学习逐步逆转给定的图像损坏过程。近期,一些研究提出了扩散模型中图像损坏的替代方式,重点聚焦在模糊化处理上。然而,这些研究完全基于经验主义,图像损坏的最优流程仍不明确。在本工作中,我们假设最优流程能够最小化图像在向最终状态损坏过程中所经过路径的长度。我们提出以概率分布空间度量的路径长度——费舍尔度量。根据该度量计算最短路径,并证明其对应图像锐化(而非模糊化)与噪声去模糊的组合操作。虽然先前工作中的损坏过程具有任意性,但我们的最短路径扩散(SPD)唯一确定了损坏过程的完整时空结构。实验表明,SPD无需任何超参数调优即可改进强基线模型,并优于所有基于图像模糊化的前人扩散模型。此外,任何对最短路径的微小偏离都会导致性能下降,这暗示SPD提供了图像损坏的最优流程。本工作为近期研究中的发现提供了新视角,并为改进图像及其他数据类型上的扩散模型开辟了新途径。