Visual place recognition methods struggle with occlusions and partial visual overlaps. We propose a novel visual place recognition approach based on overlap prediction, called VOP, shifting from traditional reliance on global image similarities and local features to image overlap prediction. VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone and establishing patch-to-patch correspondences without requiring expensive feature detection and matching. Our approach uses a voting mechanism to assess overlap scores for potential database images. It provides a nuanced image retrieval metric in challenging scenarios. Experimental results show that VOP leads to more accurate relative pose estimation and localization results on the retrieved image pairs than state-of-the-art baselines on a number of large-scale, real-world indoor and outdoor benchmarks. The code is available at https://github.com/weitong8591/vop.git.
翻译:视觉地点识别方法在处理遮挡和部分视觉重叠时面临挑战。本文提出一种基于重叠预测的新型视觉地点识别方法VOP,该方法将传统对全局图像相似性和局部特征的依赖转向图像重叠预测。VOP通过Vision Transformer骨干网络获取图像块级嵌入特征,并建立块到块的对应关系,无需昂贵的特征检测与匹配过程,从而处理共视图像区域。本方法采用投票机制评估候选数据库图像的重叠分数,在复杂场景中提供精细化的图像检索度量。实验结果表明,在多个大规模真实世界室内外基准数据集上,VOP在检索图像对上的相对姿态估计与定位结果均优于当前最先进的基线方法。代码发布于https://github.com/weitong8591/vop.git。