Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications. Due to the specific requirements of different tasks like optical flow estimation and local feature matching, previous works are primarily categorized into dense matching and sparse feature matching focusing on specialized architectures along with task-specific datasets, which may somewhat hinder the generalization performance of specialized models. In this paper, we propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching). In particular, we elaborately design a cascaded GRU module for refinement by exploring the geometric similarity iteratively at multiple scales following an additional uncertainty estimation module for sparsification. To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth by generating optical flow supervision with greater intervals. As such, we are able to mix up various dense and sparse matching datasets, significantly improving the training diversity. The generalization capacity of our proposed RGM is greatly improved by learning the matching and uncertainty estimation in a two-stage manner on the large, mixed data. Superior performance is achieved for zero-shot matching and downstream geometry estimation across multiple datasets, outperforming the previous methods by a large margin.
翻译:摘要:在图像对中寻找对应像素是一项基础的计算机视觉任务,具有广泛的应用。由于光流估计和局部特征匹配等不同任务的具体需求,以往的工作主要分为密集匹配和稀疏特征匹配,侧重于专用架构和任务特定数据集,这在一定程度上限制了专用模型的泛化性能。本文提出了一种用于稀疏和密集匹配的深度模型,称为RGM(鲁棒通用匹配)。具体而言,我们精心设计了一个级联GRU模块,通过迭代探索多尺度几何相似性进行细化,并附加不确定性估计模块用于稀疏化。为缩小合成训练样本与实际场景之间的差距,我们通过生成更大间隔的光流监督,构建了一个带有稀疏对应真值的全新大规模数据集。通过这种方式,我们能够混合多种密集和稀疏匹配数据集,显著提高训练多样性。通过在大规模混合数据上以两阶段方式学习匹配和不确定性估计,我们提出的RGM模型的泛化能力得到了极大提升。在多个数据集的零样本匹配和下游几何估计任务中,该模型实现了优越的性能,大幅超越了以往方法。