Deep Homography Estimation for Visual Place Recognition

Visual place recognition (VPR) is a fundamental task for many applications such as robot localization and augmented reality. Recently, the hierarchical VPR methods have received considerable attention due to the trade-off between accuracy and efficiency. They usually first use global features to retrieve the candidate images, then verify the spatial consistency of matched local features for re-ranking. However, the latter typically relies on the RANSAC algorithm for fitting homography, which is time-consuming and non-differentiable. This makes existing methods compromise to train the network only in global feature extraction. Here, we propose a transformer-based deep homography estimation (DHE) network that takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Moreover, we design a re-projection error of inliers loss to train the DHE network without additional homography labels, which can also be jointly trained with the backbone network to help it extract the features that are more suitable for local matching. Extensive experiments on benchmark datasets show that our method can outperform several state-of-the-art methods. And it is more than one order of magnitude faster than the mainstream hierarchical VPR methods using RANSAC. The code is released at https://github.com/Lu-Feng/DHE-VPR.

翻译：视觉地点识别（VPR）是机器人定位和增强现实等许多应用中的基本任务。近年来，层次化VPR方法因在准确性与效率之间的权衡而受到广泛关注。这类方法通常首先使用全局特征检索候选图像，随后对匹配的局部特征进行空间一致性验证以重新排序。然而，后者通常依赖RANSAC算法拟合单应性矩阵，这一过程耗时且不可微分，导致现有方法只能将网络训练限定在全局特征提取阶段。为此，本文提出一种基于Transformer的深度单应性估计（DHE）网络，该网络以主干网络提取的密集特征图为输入，通过可学习的几何验证实现快速单应性拟合。此外，我们设计了基于内点重投影误差的损失函数，无需额外单应性标注即可训练DHE网络，并使其能与主干网络联合训练，从而帮助主干网络提取更适用于局部匹配的特征。在基准数据集上的广泛实验表明，我们的方法优于多种现有最优方法，且相比于使用RANSAC的主流层次化VPR方法，速度提升超过一个数量级。相关代码已在https://github.com/Lu-Feng/DHE-VPR 开源。

相关内容

Networking

关注 0

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日