Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications. Due to the specific requirements of different tasks like optical flow estimation and local feature matching, previous works are primarily categorized into dense matching and sparse feature matching focusing on specialized architectures along with task-specific datasets, which may somewhat hinder the generalization performance of specialized models. In this paper, we propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching). In particular, we elaborately design a cascaded GRU module for refinement by exploring the geometric similarity iteratively at multiple scales following an additional uncertainty estimation module for sparsification. To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth by generating optical flow supervision with greater intervals. As such, we are able to mix up various dense and sparse matching datasets, significantly improving the training diversity. The generalization capacity of our proposed RGM is greatly improved by learning the matching and uncertainty estimation in a two-stage manner on the large, mixed data. Superior performance is achieved for zero-shot matching and downstream geometry estimation across multiple datasets, outperforming the previous methods by a large margin.
翻译:寻找图像对中的对应像素是一项具有多种应用的基础计算机视觉任务。由于光流估计和局部特征匹配等不同任务的具体需求,以往的工作主要分为密集匹配和稀疏特征匹配,专注于专用架构以及任务特定数据集,这在某种程度上可能限制了专用模型的泛化性能。在本文中,我们提出了一种用于稀疏与密集匹配的深度模型,称为RGM(鲁棒通用匹配)。特别地,我们精心设计了一个级联门控循环单元模块,通过在多尺度下迭代探索几何相似性进行细化,并附加一个不确定性估计模块用于稀疏化。为缩小合成训练样本与真实场景之间的差距,我们通过生成具有更大间隔的光流监督信号,构建了一个新的、带有稀疏对应基准真值的大规模数据集。由此,我们能够混合各种密集与稀疏匹配数据集,显著提升训练多样性。通过在大规模混合数据上以两阶段方式学习匹配与不确定性估计,我们提出的RGM的泛化能力得到了极大提升。在多个数据集上,对于零样本匹配和下游几何估计任务,该方法均取得了优越性能,大幅超越了以往方法。