This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of images improves performance. In our recent work on measuring domain gaps between image datasets, we proposed a Visual Distribution of Neuron Activations (VDNA) representation to represent datasets of images. This representation can naturally handle image sequences and provides a general and granular feature representation derived from a general-purpose model. Moreover, our representation is based on tracking neuron activation values over the list of images to represent and is not limited to a particular neural network layer, therefore having access to high- and low-level concepts. This work shows how VDNAs can be used for VPR by learning a very lightweight and simple encoder to generate task-specific descriptors. Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution, such as to indoor environments and aerial imagery.
翻译:本文采用通用数据集表示技术,生成鲁棒性视觉位置识别描述符,这对于实现真实世界移动机器人定位至关重要。关于视觉位置识别的两条并行研究路线表明:一方面通用现成特征表示能提供对域偏移的鲁棒性,另一方面从图像序列中融合信息可提升性能。在我们近期关于度量图像数据集间域差距的研究中,提出了一种基于神经元激活视觉分布(Visual Distribution of Neuron Activations, VDNA)的数据集表示方法。该表示能自然处理图像序列,并通过通用模型提取通用且细粒度的特征表示。此外,该表示通过追踪待处理图像列表中神经元激活值实现,不局限于特定神经网络层,因此可同时获取高层与低层概念。本文展示了通过学习超轻量级编码器生成任务专用描述符,如何将VDNA应用于视觉位置识别。实验结果表明,相较于现有解决方案,我们的表示能对严重偏离训练数据分布的域偏移(如室内环境与航拍图像)展现出更优的鲁棒性。