Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications. Due to the specific requirements of different tasks like optical flow estimation and local feature matching, previous works are primarily categorized into dense matching and sparse feature matching focusing on specialized architectures along with task-specific datasets, which may somewhat hinder the generalization performance of specialized models. In this paper, we propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching). In particular, we elaborately design a cascaded GRU module for refinement by exploring the geometric similarity iteratively at multiple scales following an additional uncertainty estimation module for sparsification. To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth by generating optical flow supervision with greater intervals. As such, we are able to mix up various dense and sparse matching datasets, significantly improving the training diversity. The generalization capacity of our proposed RGM is greatly improved by learning the matching and uncertainty estimation in a two-stage manner on the large, mixed data. Superior performance is achieved for zero-shot matching and downstream geometry estimation across multiple datasets, outperforming the previous methods by a large margin.
翻译:摘要:在图像对中寻找对应像素是一项基础计算机视觉任务,具有广泛的应用。由于光流估计和局部特征匹配等不同任务的具体需求,先前的研究主要分为密集匹配和稀疏特征匹配,侧重于专门化的架构以及特定任务数据集,这在某种程度上可能限制了专门化模型的泛化性能。本文提出了一种适用于稀疏和密集匹配的深度模型,命名为RGM(鲁棒性通用匹配)。具体而言,我们精心设计了一个级联GRU模块,通过多尺度迭代探索几何相似性以进行细化,并辅以一个额外的不确定性估计模块用于稀疏化。为缩小合成训练样本与真实场景之间的差距,我们通过生成更大间隔的光流监督信号,构建了一个新的、大规模的、具有稀疏对应真实值的数据集。由此,我们能够混合多种密集和稀疏匹配数据集,显著提升训练多样性。通过在混合大数据上以两阶段方式学习匹配和不确定性估计,所提出的RGM的泛化能力得到了极大提升。在多个数据集上进行零样本匹配和下游几何估计任务时,我们取得了优越的性能,以大幅度优势超越了先前的方法。