Autonomous agents such as cars, robots and drones need to precisely localize themselves in diverse environments, including in GPS-denied indoor environments. One approach for precise localization is visual place recognition (VPR), which estimates the place of an image based on previously seen places. State-of-the-art VPR models require high amounts of memory, making them unwieldy for mobile deployment, while more compact models lack robustness and generalization capabilities. This work overcomes these limitations for robotics using a combination of event-based vision sensors and an event-based novel guided variational autoencoder (VAE). The encoder part of our model is based on a spiking neural network model which is compatible with power-efficient low latency neuromorphic hardware. The VAE successfully disentangles the visual features of 16 distinct places in our new indoor VPR dataset with a classification performance comparable to other state-of-the-art approaches while, showing robust performance also under various illumination conditions. When tested with novel visual inputs from unknown scenes, our model can distinguish between these places, which demonstrates a high generalization capability by learning the essential features of location. Our compact and robust guided VAE with generalization capabilities poses a promising model for visual place recognition that can significantly enhance mobile robot navigation in known and unknown indoor environments.
翻译:自动驾驶智能体(如汽车、机器人及无人机)需在多样化环境中实现精确定位,包括全球定位系统信号缺失的室内环境。视觉地点识别是实现精确定位的一种方法,该方法基于先前观测过的地点对图像位置进行估计。当前最先进的视觉地点识别模型需要大量内存,难以部署于移动平台,而更紧凑的模型则缺乏鲁棒性与泛化能力。本研究通过结合事件视觉传感器与新型事件驱动引导变分自编码器,克服了机器人应用中的这些局限。模型编码器部分基于脉冲神经网络架构,该架构兼容高能效、低延迟的神经形态硬件。该变分自编码器在我们新建的室内视觉地点识别数据集中,成功解耦了16个不同地点的视觉特征,其分类性能与现有先进方法相当,并在多种光照条件下保持稳定表现。当输入未知场景的新视觉数据时,模型能准确区分这些地点,表明其通过学习位置本质特征获得了强大的泛化能力。本研究提出的紧凑、鲁棒且具备泛化能力的引导变分自编码器,为视觉地点识别提供了具有前景的解决方案,可显著增强移动机器人在已知与未知室内环境中的导航能力。