Visual Place recognition is commonly addressed as an image retrieval problem. However, retrieval methods are impractical to scale to large datasets, densely sampled from city-wide maps, since their dimension impact negatively on the inference time. Using approximate nearest neighbour search for retrieval helps to mitigate this issue, at the cost of a performance drop. In this paper we investigate whether we can effectively approach this task as a classification problem, thus bypassing the need for a similarity search. We find that existing classification methods for coarse, planet-wide localization are not suitable for the fine-grained and city-wide setting. This is largely due to how the dataset is split into classes, because these methods are designed to handle a sparse distribution of photos and as such do not consider the visual aliasing problem across neighbouring classes that naturally arises in dense scenarios. Thus, we propose a partitioning scheme that enables a fast and accurate inference, preserving a simple learning procedure, and a novel inference pipeline based on an ensemble of novel classifiers that uses the prototypes learned via an angular margin loss. Our method, Divide&Classify (D&C), enjoys the fast inference of classification solutions and an accuracy competitive with retrieval methods on the fine-grained, city-wide setting. Moreover, we show that D&C can be paired with existing retrieval pipelines to speed up computations by over 20 times while increasing their recall, leading to new state-of-the-art results.
翻译:视觉地点识别通常被当作图像检索问题来处理。然而,检索方法难以扩展至大规模数据集,尤其是从城域地图中密集采样的数据集,因为其维度会对推理时间产生负面影响。采用近似最近邻搜索进行检索有助于缓解这一问题,但会以性能下降为代价。本文探究是否可以将该任务有效视为分类问题,从而避免相似性搜索的需求。我们发现,现有用于粗粒度全球定位的分类方法并不适用于细粒度城域场景,这主要源于数据集划分方式:这类方法旨在处理稀疏分布的照片,因此未考虑密集场景中自然产生的相邻类别间的视觉混淆问题。为此,我们提出一种既能实现快速准确推理、又可保持简单学习流程的划分方案,以及基于新型分类器集成的新型推理流程——该流程通过角间隔损失学习原型。我们的方法 Divide&Classify(D&C)在细粒度城域场景下兼具分类方案的快速推理能力与检索方法相当的精度。此外,我们证明 D&C 可与现有检索流程结合使用,在提升召回率的同时将计算速度提升20倍以上,从而取得新的最优结果。