Event cameras offer high temporal resolution and dynamic range with minimal motion blur, making them promising for object detection tasks. While Spiking Neural Networks (SNNs) are a natural match for event-based sensory data and enable ultra-energy efficient and low latency inference on neuromorphic hardware, Artificial Neural Networks (ANNs) tend to display more stable training dynamics and faster convergence resulting in greater task performance. Hybrid SNN-ANN approaches are a promising alternative, enabling to leverage the strengths of both SNN and ANN architectures. In this work, we introduce the first Hybrid Attention-based SNN-ANN backbone for object detection using event cameras. We propose a novel Attention-based SNN-ANN bridge module to capture sparse spatial and temporal relations from the SNN layer and convert them into dense feature maps for the ANN part of the backbone. Experimental results demonstrate that our proposed method surpasses baseline hybrid and SNN-based approaches by significant margins, with results comparable to existing ANN-based methods. Extensive ablation studies confirm the effectiveness of our proposed modules and architectural choices. These results pave the way toward a hybrid SNN-ANN architecture that achieves ANN like performance at a drastically reduced parameter budget. We implemented the SNN blocks on digital neuromorphic hardware to investigate latency and power consumption and demonstrate the feasibility of our approach.
翻译:事件相机具有高时间分辨率、高动态范围及低运动模糊特性,使其在目标检测任务中极具应用前景。脉冲神经网络(SNN)天然适配事件型传感器数据,可在神经形态硬件上实现超低能耗与低延迟推理,而人工神经网络(ANN)则展现出更稳定的训练动态与更快的收敛速度,从而获得更优的任务性能。混合SNN-ANN方法作为极具潜力的替代方案,能够融合两类网络架构的优势。本文首次提出基于注意力机制的混合SNN-ANN骨干网络用于事件相机目标检测。我们创新性地设计了基于注意力机制的SNN-ANN桥接模块,该模块可提取SNN层中的稀疏时空关联信息,并将其转换为稠密特征图以馈入骨干网络的ANN部分。实验表明,所提方法显著超越基线混合方法与纯SNN方法,性能可与现有ANN方法相媲美。大量消融研究验证了所提模块与架构设计有效性。这些成果为构建在参数预算大幅缩减条件下实现ANN级性能的混合SNN-ANN架构奠定了技术基础。我们在数字神经形态硬件上实现了SNN模块以研究延迟与功耗特性,证实了本方法的可行性。