Low-power event-driven computation and inherent temporal dynamics render spiking neural networks (SNNs) ideal candidates for processing highly dynamic and asynchronous signals from event-based sensors. However, due to the challenges in training and architectural design constraints, there is a scarcity of competitive demonstrations of SNNs in event-based dense prediction compared to artificial neural networks (ANNs). In this work, we construct an efficient spiking encoder-decoder network for large-scale event-based semantic segmentation tasks, optimizing the encoder with hierarchical search. To improve learning from highly dynamic event streams, we exploit the intrinsic adaptive threshold of spiking neurons to modulate network activation. Additionally, we develop a dual-path spiking spatially-adaptive modulation (SSAM) block to enhance the representation of sparse events, significantly improving network performance. Our network achieves 72.57% mean intersection over union (MIoU) on the DDD17 dataset and 57.22% MIoU on the newly proposed larger DSEC-Semantic dataset, surpassing current record ANNs by 4% while utilizing much lower computation costs. To the best of our knowledge, this is the first instance of SNNs outperforming ANNs in challenging event-based semantic segmentation tasks, demonstrating their immense potential in event-based vision. Our code will be publicly available.
翻译:低功耗事件驱动计算与固有时间动态特性使脉冲神经网络(SNNs)成为处理事件传感器产生的高动态异步信号的理想选择。然而由于训练难度与架构设计限制,相较于人工神经网络(ANNs),SNNs在事件驱动密集预测任务中仍缺乏具有竞争力的示范。本文构建了面向大规模事件驱动语义分割任务的高效脉冲编解码网络,通过分层搜索优化编码器结构。为提升对高动态事件流的学习能力,我们利用脉冲神经元内在自适应阈值调节网络激活状态。此外,我们开发了双路径脉冲空间自适应调制(SSAM)模块以增强稀疏事件表征能力,显著提升网络性能。本网络在DDD17数据集上达到72.57%平均交并比(MIoU),在新提出的更大规模DSEC-Semantic数据集上达到57.22% MIoU,较当前最优ANNs提升4%的同时大幅降低计算成本。据我们所知,这是SNNs首次在挑战性事件驱动语义分割任务中超越ANNs,充分展现了其在事件驱动视觉领域的巨大潜力。我们的代码将公开发布。