Visual place recognition (VPR) is usually considered as a specific image retrieval problem. Limited by existing training frameworks, most deep learning-based works cannot extract sufficiently stable global features from RGB images and rely on a time-consuming re-ranking step to exploit spatial structural information for better performance. In this paper, we propose StructVPR, a novel training architecture for VPR, to enhance structural knowledge in RGB global features and thus improve feature stability in a constantly changing environment. Specifically, StructVPR uses segmentation images as a more definitive source of structural knowledge input into a CNN network and applies knowledge distillation to avoid online segmentation and inference of seg-branch in testing. Considering that not all samples contain high-quality and helpful knowledge, and some even hurt the performance of distillation, we partition samples and weigh each sample's distillation loss to enhance the expected knowledge precisely. Finally, StructVPR achieves impressive performance on several benchmarks using only global retrieval and even outperforms many two-stage approaches by a large margin. After adding additional re-ranking, ours achieves state-of-the-art performance while maintaining a low computational cost.
翻译:视觉位置识别(VPR)通常被视为一种特定的图像检索问题。受现有训练框架的限制,大多数基于深度学习的方法无法从RGB图像中提取足够稳定的全局特征,并依赖耗时重排序步骤来利用空间结构信息以获得更优性能。本文提出StructVPR——一种针对VPR的新型训练架构,旨在增强RGB全局特征中的结构知识,从而提升特征在动态环境中的稳定性。具体而言,StructVPR将分割图像作为更明确的结构知识源输入CNN网络,并通过知识蒸馏避免在测试阶段对分割分支进行在线分割与推理。考虑到并非所有样本都包含高质量且有益的知识,部分样本甚至可能损害蒸馏性能,我们对样本进行划分并对每个样本的蒸馏损失进行加权,以精确增强期望的知识。最终,StructVPR仅使用全局检索便在多个基准上取得了令人瞩目的性能,甚至大幅超越众多两阶段方法。在添加额外重排序后,本方法在保持低计算成本的同时达到了最先进水平。