Visual place recognition (VPR) is a fundamental task of computer vision for visual localization. Existing methods are trained using image pairs that either depict the same place or not. Such a binary indication does not consider continuous relations of similarity between images of the same place taken from different positions, determined by the continuous nature of camera pose. The binary similarity induces a noisy supervision signal into the training of VPR methods, which stall in local minima and require expensive hard mining algorithms to guarantee convergence. Motivated by the fact that two images of the same place only partially share visual cues due to camera pose differences, we deploy an automatic re-annotation strategy to re-label VPR datasets. We compute graded similarity labels for image pairs based on available localization metadata. Furthermore, we propose a new Generalized Contrastive Loss (GCL) that uses graded similarity labels for training contrastive networks. We demonstrate that the use of the new labels and GCL allow to dispense from hard-pair mining, and to train image descriptors that perform better in VPR by nearest neighbor search, obtaining superior or comparable results than methods that require expensive hard-pair mining and re-ranking techniques. Code and models available at: https://github.com/marialeyvallina/generalized_contrastive_loss
翻译:视觉地点识别(VPR)是计算机视觉中用于视觉定位的基本任务。现有方法使用的训练图像对要么描绘相同地点,要么描绘不同地点。这种二元指示未考虑同一地点不同视角拍摄的图像之间由相机位姿连续变化所决定的相似度连续关系。二元相似度在VPR方法训练中引入了噪声监督信号,导致方法陷入局部最小值,且需要昂贵的难例挖掘算法来保证收敛。鉴于同一地点的两张图像因相机位姿差异仅共享部分视觉线索,我们采用自动重新标注策略对VPR数据集进行重新标记。基于可获取的定位元数据,为图像对计算分级相似度标签。此外,我们提出新的广义对比损失(GCL),利用分级相似度标签训练对比网络。我们证明,使用新标签和GCL可省去难例对挖掘步骤,并训练出通过最近邻搜索在VPR中表现更优的图像描述子,其性能优于或堪比需要昂贵难例挖掘与重排序技术的方法。代码和模型见:https://github.com/marialeyvallina/generalized_contrastive_loss