Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks, including image interpolation, inversion, and editing. In this work, we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue, we propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition, we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks. Smooth Diffusion is implemented as a plug-and-play Smooth-LoRA to work with various community models. Code is available at https://github.com/SHI-Labs/Smooth-Diffusion.
翻译:近期,扩散模型在文本到图像生成任务中取得了显著进展,能够合成高保真度且内容多样的图像。然而,扩散模型中的潜空间平滑性仍未得到充分探索。平滑的潜空间能确保输入潜变量的扰动对应于输出图像的稳定变化。这一特性在图像插值、反演和编辑等下游任务中具有重要价值。本研究通过观察微小潜变量变化引起的明显视觉波动,揭示了扩散潜空间的不平滑性。为应对该问题,我们提出平滑扩散——一种兼具高性能与平滑性的新型扩散模型类别。具体而言,我们引入步进变分正则化,强制任意输入潜变量的变分与输出图像变分之间的比例在任意扩散训练步骤中保持恒定。此外,我们设计插值标准差指标以有效评估扩散模型的潜空间平滑性。大量定量与定性实验表明,平滑扩散不仅在文本到图像生成中表现优异,在各类下游任务中同样脱颖而出。该模型可作为即插即用的Smooth-LoRA模块与多种社区模型协同工作。代码已开源至https://github.com/SHI-Labs/Smooth-Diffusion。