Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model pre-training and VPR, how to bridge the gap and fully unleash the capability of pre-trained models for VPR is still a key issue to address. To this end, we propose a novel method to realize seamless adaptation of pre-trained models for VPR. Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method to achieve both global and local adaptation efficiently, in which only lightweight adapters are tuned without adjusting the pre-trained model. Besides, to guide effective adaptation, we propose a mutual nearest neighbor local feature loss, which ensures proper dense local features are produced for local matching and avoids time-consuming spatial verification in re-ranking. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time, and uses about only 3% retrieval runtime of the two-stage VPR methods with RANSAC-based spatial verification. It ranks 1st on the MSLS challenge leaderboard (at the time of submission). The code is released at https://github.com/Lu-Feng/SelaVPR.

翻译：近期研究表明，在大规模数据通用视觉学习任务中预训练的视觉模型，可为多种视觉感知问题提供有效的特征表示。然而，目前鲜有研究将预训练基础模型应用于视觉地点识别任务。由于模型预训练与视觉地点识别在训练目标和数据方面存在固有差异，如何弥合这一差距并充分释放预训练模型在视觉地点识别中的能力，仍是亟待解决的关键问题。为此，本文提出一种新颖方法，实现预训练模型对视觉地点识别的无缝适配。具体而言，为获取聚焦显著地标以识别地点的全局与局部特征，我们设计了一种高效实现全局与局部适配的混合适配方法，该方法仅需调整轻量级适配器而无需改动预训练模型。此外，为引导有效适配，我们提出一种互近邻局部特征损失函数，该函数能确保生成恰当的密集局部特征用于局部匹配，并避免重排序中耗时的空间验证步骤。实验结果表明，本方法在训练数据与训练时间更少的情况下仍优于现有最优方法，且其检索耗时仅为采用基于RANSAC空间验证的两阶段视觉地点识别方法的约3%。本方法在MSLS挑战排行榜（投稿时）位列第一。代码已开源至https://github.com/Lu-Feng/SelaVPR。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日