Aggregating Multiple Bio-Inspired Image Region Classifiers For Effective And Lightweight Visual Place Recognition

Visual place recognition (VPR) enables autonomous systems to localize themselves within an environment using image information. While VPR techniques built upon a Convolutional Neural Network (CNN) backbone dominate state-of-the-art VPR performance, their high computational requirements make them unsuitable for platforms equipped with low-end hardware. Recently, a lightweight VPR system based on multiple bio-inspired classifiers, dubbed DrosoNets, has been proposed, achieving great computational efficiency at the cost of reduced absolute place retrieval performance. In this work, we propose a novel multi-DrosoNet localization system, dubbed RegionDrosoNet, with significantly improved VPR performance, while preserving a low-computational profile. Our approach relies on specializing distinct groups of DrosoNets on differently sliced partitions of the original image, increasing extrinsic model differentiation. Furthermore, we introduce a novel voting module to combine the outputs of all DrosoNets into the final place prediction which considers multiple top refence candidates from each DrosoNet. RegionDrosoNet outperforms other lightweight VPR techniques when dealing with both appearance changes and viewpoint variations. Moreover, it competes with computationally expensive methods on some benchmark datasets at a small fraction of their online inference time.

翻译：视觉位置识别（VPR）使自主系统能够利用图像信息在环境中进行定位。尽管以卷积神经网络（CNN）为主干的VPR技术主导了当前最优性能，但其高计算需求使其不适用于搭载低端硬件的平台。近期，一种基于多生物启发分类器的轻量化VPR系统——DrosoNets——被提出，该方案在实现极高计算效率的同时，以降低绝对位置检索性能为代价。本文提出一种新颖的多DrosoNet定位系统——RegionDrosoNet，在保持低计算特性的前提下显著提升了VPR性能。我们的方法通过将原始图像划分为不同切片区域，专门训练不同的DrosoNet分组，从而增强模型间差异化。此外，我们引入新型投票模块来融合所有DrosoNet的输出以生成最终位置预测，该模块可综合每个DrosoNet的多个最优候选参考结果。在应对外观变化和视角变化时，RegionDrosoNet不仅优于其他轻量化VPR技术，还能在部分基准数据集上与高计算成本方法竞争，而其在线推理时间仅为这些方法的极小部分。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日