Diffusion models have achieved remarkable image generation quality surpassing previous generative models. However, a notable limitation of diffusion models, in comparison to GANs, is their difficulty in smoothly interpolating between two image samples, due to their highly unstructured latent space. Such a smooth interpolation is intriguing as it naturally serves as a solution for the image morphing task with many applications. In this work, we present DiffMorpher, the first approach enabling smooth and natural image interpolation using diffusion models. Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition, where correspondence automatically emerges without the need for annotation. In addition, we propose an attention interpolation and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images. Extensive experiments demonstrate that DiffMorpher achieves starkly better image morphing effects than previous methods across a variety of object categories, bridging a critical functional gap that distinguished diffusion models from GANs.
翻译:摘要:扩散模型在图像生成质量上已超越以往的生成模型,展现出卓越性能。然而,相较于生成对抗网络(GANs),扩散模型的一个显著局限在于其高度非结构化的潜在空间导致难以实现两个图像样本之间的平滑插值。这种平滑插值极具吸引力,因为它天然可作为图像变形任务的解决方案,且拥有广泛的应用场景。本文提出DiffMorpher,这是首个利用扩散模型实现平滑自然图像插值的方法。我们的核心思想是:通过分别为两张图像拟合两个LoRA(低秩适应)模块来捕捉其语义信息,并在LoRA参数与潜在噪声之间进行插值,以确保语义的平滑过渡——在此过程中,对应关系会自动涌现,无需人工标注。此外,我们提出了一种注意力插值与注入技术,以及一种新的采样调度策略,以进一步增强连续图像间的平滑性。大量实验表明,DiffMorpher在多种物体类别上均取得了显著优于以往方法的图像变形效果,弥合了扩散模型与GANs之间的一项关键功能差距。