Dense feature matching is an important computer vision task that involves estimating all correspondences between two images of a 3D scene. In this paper, we revisit robust losses for matching from a Markov chain perspective, yielding theoretical insights and large gains in performance. We begin by constructing a unifying formulation of matching as a Markov chain, based on which we identify two key stages which we argue should be decoupled for matching. The first is the coarse stage, where the estimated result needs to be globally consistent. The second is the refinement stage, where the model needs precise localization capabilities. Inspired by the insight that these stages concern distinct issues, we propose a coarse matcher following the regression-by-classification paradigm that provides excellent globally consistent, albeit not exactly localized, matches. This is followed by a local feature refinement stage using well-motivated robust regression losses, yielding extremely precise matches. Our proposed approach, which we call RoMa, achieves significant improvements compared to the state-of-the-art. Code is available at https://github.com/Parskatt/RoMa
翻译:密集特征匹配是一项重要的计算机视觉任务,涉及估计3D场景两幅图像之间的所有对应关系。在本文中,我们从马尔可夫链的角度重新审视用于匹配的稳健损失函数,从而获得理论洞见和性能的大幅提升。我们首先将匹配问题统一构建为马尔可夫链,并基于此识别出两个关键阶段——我们认为这两个阶段应解耦处理。第一阶段是粗略匹配阶段,估计结果需保持全局一致性;第二阶段是精细优化阶段,模型需具备精确定位能力。基于这些阶段关注不同问题的洞察,我们提出了一种遵循“回归-分类”范式的粗略匹配器,能够提供全局一致性优秀但定位稍显模糊的匹配结果。随后,利用理论完善的稳健回归损失函数进行局部特征精细优化,从而获得极高精度的匹配结果。我们提出的方法名为RoMa,相比现有最先进技术取得了显著改进。代码已开源,详见https://github.com/Parskatt/RoMa