We study the problem of generating intermediate images from image pairs with large motion while maintaining semantic consistency. Due to the large motion, the intermediate semantic information may be absent in input images. Existing methods either limit to small motion or focus on topologically similar objects, leading to artifacts and inconsistency in the interpolation results. To overcome this challenge, we delve into pre-trained image diffusion models for their capabilities in semantic cognition and representations, ensuring consistent expression of the absent intermediate semantic representations with the input. To this end, we propose DreamMover, a novel image interpolation framework with three main components: 1) A natural flow estimator based on the diffusion model that can implicitly reason about the semantic correspondence between two images. 2) To avoid the loss of detailed information during fusion, our key insight is to fuse information in two parts, high-level space and low-level space. 3) To enhance the consistency between the generated images and input, we propose the self-attention concatenation and replacement approach. Lastly, we present a challenging benchmark dataset InterpBench to evaluate the semantic consistency of generated results. Extensive experiments demonstrate the effectiveness of our method. Our project is available at https://dreamm0ver.github.io .
翻译:本文研究从具有大运动的图像对生成中间图像,同时保持语义一致性的问题。由于运动幅度大,中间语义信息可能在输入图像中缺失。现有方法要么局限于小运动,要么专注于拓扑结构相似的物体,导致插值结果出现伪影和不一致。为克服这一挑战,我们深入探索预训练图像扩散模型在语义认知和表征方面的能力,确保缺失的中间语义表征与输入保持一致表达。为此,我们提出DreamMover,一个新颖的图像插值框架,包含三个主要组件:1)基于扩散模型的自然流估计器,能够隐式推理两幅图像间的语义对应关系;2)为避免融合过程中的细节信息丢失,我们的核心见解是在高层空间和低层空间两部分进行信息融合;3)为增强生成图像与输入之间的一致性,我们提出自注意力拼接与替换方法。最后,我们提出了一个具有挑战性的基准数据集InterpBench,用于评估生成结果的语义一致性。大量实验证明了我们方法的有效性。项目地址:https://dreamm0ver.github.io。