Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model pre-training and VPR, how to bridge the gap and fully unleash the capability of pre-trained models for VPR is still a key issue to address. To this end, we propose a novel method to realize seamless adaptation of pre-trained models for VPR. Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method to achieve both global and local adaptation efficiently, in which only lightweight adapters are tuned without adjusting the pre-trained model. Besides, to guide effective adaptation, we propose a mutual nearest neighbor local feature loss, which ensures proper dense local features are produced for local matching and avoids time-consuming spatial verification in re-ranking. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time, and uses about only 3% retrieval runtime of the two-stage VPR methods with RANSAC-based spatial verification. It ranks 1st on the MSLS challenge leaderboard (at the time of submission). The code is released at https://github.com/Lu-Feng/SelaVPR.

翻译：近期研究表明，在大规模数据通用视觉学习任务中预训练的视觉模型，能为广泛视觉感知问题提供有效的特征表示。然而，现有研究鲜有尝试利用预训练基础模型解决视觉地点识别（VPR）问题。由于模型预训练与VPR任务在训练目标和数据上存在固有差异，如何弥合鸿沟并充分释放预训练模型在VPR中的能力仍是关键问题。为此，我们提出一种实现预训练模型无缝适应VPR的新方法。具体而言，为获取聚焦显著地标以区分地点的全局与局部特征，我们设计了高效的混合适应方法，通过仅微调轻量适配器而不调整预训练模型，同步实现全局与局部适应。此外，为引导有效适应，我们提出互近邻局部特征损失函数，该函数能确保生成合适的密集局部特征用于局部匹配，同时避免重排序阶段耗时空间验证。实验结果表明，本方法在更少训练数据和训练时间下超越现有最优方法，其检索耗时仅为基于RANSAC空间验证的两阶段VPR方法的3%。该方法在MSLS挑战排行榜（提交时）位列第一。代码已开源至https://github.com/Lu-Feng/SelaVPR。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日