Event cameras have been successfully applied to visual place recognition (VPR) tasks by using deep artificial neural networks (ANNs) in recent years. However, previously proposed deep ANN architectures are often unable to harness the abundant temporal information presented in event streams. In contrast, deep spiking networks exhibit more intricate spatiotemporal dynamics and are inherently well-suited to process sparse asynchronous event streams. Unfortunately, directly inputting temporal-dense event volumes into the spiking network introduces excessive time steps, resulting in prohibitively high training costs for large-scale VPR tasks. To address the aforementioned issues, we propose a novel deep spiking network architecture called Spike-EVPR for event-based VPR tasks. First, we introduce two novel event representations tailored for SNN to fully exploit the spatio-temporal information from the event streams, and reduce the video memory occupation during training as much as possible. Then, to exploit the full potential of these two representations, we construct a Bifurcated Spike Residual Encoder (BSR-Encoder) with powerful representational capabilities to better extract the high-level features from the two event representations. Next, we introduce a Shared & Specific Descriptor Extractor (SSD-Extractor). This module is designed to extract features shared between the two representations and features specific to each. Finally, we propose a Cross-Descriptor Aggregation Module (CDA-Module) that fuses the above three features to generate a refined, robust global descriptor of the scene. Our experimental results indicate the superior performance of our Spike-EVPR compared to several existing EVPR pipelines on Brisbane-Event-VPR and DDD20 datasets, with the average Recall@1 increased by 7.61% on Brisbane and 13.20% on DDD20.
翻译:近年来,事件相机通过深度人工神经网络(ANNs)已成功应用于视觉位置识别(VPR)任务。然而,先前提出的深度ANN架构往往无法有效利用事件流中丰富的时序信息。相比之下,深度脉冲网络展现出更复杂的时空动态特性,天然适用于处理稀疏异步事件流。但将时序密集的事件体直接输入脉冲网络会引入过多时间步长,导致大规模VPR任务的训练成本过高。针对上述问题,我们提出一种名为Spike-EVPR的新型深度脉冲网络架构,用于基于事件的VPR任务。首先,我们引入两种专为SNN定制的新型事件表征,以充分利用事件流中的时空信息,同时尽可能降低训练过程中的显存占用。接着,为充分挖掘这两种表征的潜力,我们构建了具有强大表征能力的双分支脉冲残差编码器(BSR-Encoder),以更好地提取两种事件表征的高层特征。然后,我们提出共享与特异描述子提取器(SSD-Extractor),该模块设计用于提取两种表征间的共享特征及各自的特异特征。最后,我们提出跨描述子聚合模块(CDA-Module),将上述三种特征融合以生成精细鲁棒的场景全局描述子。实验结果表明,在Brisbane-Event-VPR和DDD20数据集上,Spike-EVPR相较于现有多种EVPR方法具有更优性能,其中在Brisbane数据集上平均Recall@1提升7.61%,在DDD20数据集上提升13.20%。