Visual Place Recognition (VPR) aims to robustly identify locations by leveraging image retrieval based on descriptors encoded from environmental images. However, drastic appearance changes of images captured from different viewpoints at the same location pose incoherent supervision signals for descriptor learning, which severely hinder the performance of VPR. Previous work proposes classifying images based on manually defined rules or ground truth labels for viewpoints, followed by descriptor training based on the classification results. However, not all datasets have ground truth labels of viewpoints and manually defined rules may be suboptimal, leading to degraded descriptor performance.To address these challenges, we introduce the mutual learning of viewpoint self-classification and VPR. Starting from coarse classification based on geographical coordinates, we progress to finer classification of viewpoints using simple clustering techniques. The dataset is partitioned in an unsupervised manner while simultaneously training a descriptor extractor for place recognition. Experimental results show that this approach almost perfectly partitions the dataset based on viewpoints, thus achieving mutually reinforcing effects. Our method even excels state-of-the-art (SOTA) methods that partition datasets using ground truth labels.
翻译:视觉地点识别(VPR)旨在通过基于环境图像编码的描述子进行图像检索,以鲁棒地识别地点。然而,在同一地点从不同视角捕获的图像存在剧烈外观变化,这为描述子学习提供了不一致的监督信号,严重阻碍了VPR的性能。先前工作提出基于手动定义的规则或真实视角标签对图像进行分类,随后根据分类结果训练描述子。然而,并非所有数据集都具备视角的真实标签,且手动定义的规则可能并非最优,导致描述子性能下降。为应对这些挑战,我们引入了视角自分类与VPR的互学习方法。从基于地理坐标的粗分类出发,我们利用简单聚类技术逐步实现更精细的视角分类。该方法以无监督方式划分数据集,同时训练用于地点识别的描述子提取器。实验结果表明,该方法能基于视角近乎完美地划分数据集,从而实现相互增强的效果。我们的方法甚至优于使用真实标签划分数据集的当前最优(SOTA)方法。