Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation

Correspondence matching plays a crucial role in numerous robotics applications. In comparison to conventional hand-crafted methods and recent data-driven approaches, there is significant interest in plug-and-play algorithms that make full use of pre-trained backbone networks for multi-scale feature extraction and leverage hierarchical refinement strategies to generate matched correspondences. The primary focus of this paper is to address the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach. First, we eliminate the pre-defined threshold employed in the hierarchical refinement process of DFM by leveraging a more flexible nearest neighbor search strategy, thereby preventing the exclusion of repetitive yet valid matches during the early stages. Our second technical contribution is the integration of a patch descriptor, which extends the applicability of DFM to accommodate a wide range of backbone networks pre-trained across diverse computer vision tasks, including image classification, semantic segmentation, and stereo matching. Taking into account the practical applicability of our method in real-world robotics applications, we also propose a novel patch descriptor distillation strategy to further reduce the computational complexity of correspondence matching. Extensive experiments conducted on three public datasets demonstrate the superior performance of our proposed method. Specifically, it achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively, on the HPatches dataset, outperforming all other SoTA algorithms. Our source code, demo video, and supplement are publicly available at mias.group/GCM.

翻译：对应匹配在众多机器人应用中扮演着关键角色。与传统手工方法和近期数据驱动方法相比，能够充分利用预训练骨干网络进行多尺度特征提取并借助分层细化策略生成匹配对应的即插即用算法受到广泛关注。本文主要致力于解决深度特征匹配（DFM）这一当前最先进的即插即用对应匹配方法的局限性。首先，我们通过采用更灵活的最近邻搜索策略，消除了DFM分层细化过程中使用的预定义阈值，从而避免在早期阶段排除重复但有效的匹配。我们的第二项技术贡献是集成了一种块描述符，该描述符将DFM的适用性扩展到在多种计算机视觉任务（包括图像分类、语义分割和立体匹配）上预训练的各种骨干网络。考虑到本方法在真实机器人应用中的实用性，我们还提出了一种新颖的块描述符蒸馏策略，以进一步降低对应匹配的计算复杂度。在三个公开数据集上进行的大量实验证明了我们方法的优越性能。具体而言，在HPatches数据集上，当误差容忍度分别为1、3和5像素时，本方法的平均匹配准确率整体达到0.68、0.92和0.95，优于所有其他当前最先进算法。我们的源代码、演示视频及补充材料公开于mias.group/GCM。