The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term. While conventional techniques focused on defining hand-designed prior terms, which are difficult to formulate, recent approaches have focused on learning the data term with deep neural networks without explicitly modeling the prior, assuming that the model itself has the capacity to learn an optimal prior from a large-scale dataset. The performance improvement was obvious, however, they often fail to address inherent ambiguities of matching, such as textureless regions, repetitive patterns, and large displacements. To address this, we propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms. Unlike previous approaches, this is accomplished by leveraging a conditional denoising diffusion model. DiffMatch consists of two main components: conditional denoising diffusion module and cost injection module. We stabilize the training process and reduce memory usage with a stage-wise training strategy. Furthermore, to boost performance, we introduce an inference technique that finds a better path to the accurate matching field. Our experimental results demonstrate significant performance improvements of our method over existing approaches, and the ablation studies validate our design choices along with the effectiveness of each component. Project page is available at https://ku-cvlab.github.io/DiffMatch/.
翻译:建立配对图像间密集对应的目标包含两项:数据项与先验项。传统方法侧重于定义手工设计的先验项,这难以进行公式化表述,而近期方法则通过深度神经网络学习数据项,不显式建模先验,假设模型本身具备从大规模数据集中学习最优先验的能力。尽管性能提升显著,但这类方法常无法应对匹配固有的歧义性,如无纹理区域、重复模式及大幅度位移。为解决此问题,我们提出DiffMatch——一种基于条件扩散的新型框架,旨在显式建模数据项与先验项。不同于先前方法,我们通过利用条件去噪扩散模型实现这一目标。DiffMatch包含两个核心模块:条件去噪扩散模块与代价注入模块。我们采用分阶段训练策略来稳定训练过程并降低内存消耗。此外,为提升性能,我们引入一种推理技术,能够为精确匹配场寻找更优路径。实验结果表明,我们的方法相较于现有方法取得了显著性能提升,消融研究验证了设计选择及各组件的有效性。项目主页参见https://ku-cvlab.github.io/DiffMatch/。