Spiking neural networks (SNNs), known for their low-power, event-driven computation and intrinsic temporal dynamics, are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors. Despite their potential, SNNs face challenges in training and architectural design, resulting in limited performance in challenging event-based dense prediction tasks compared to artificial neural networks (ANNs). In this work, we develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation tasks. To enhance the learning efficiency from dynamic event streams, we harness the adaptive threshold which improves network accuracy, sparsity and robustness in streaming inference. Moreover, we develop a dual-path Spiking Spatially-Adaptive Modulation module, which is specifically tailored to enhance the representation of sparse events and multi-modal inputs, thereby considerably improving network performance. Our SpikingEDN attains a mean intersection over union (MIoU) of 72.57\% on the DDD17 dataset and 58.32\% on the larger DSEC-Semantic dataset, showing competitive results to the state-of-the-art ANNs while requiring substantially fewer computational resources. Our results shed light on the untapped potential of SNNs in event-based vision applications. The source code will be made publicly available.
翻译:脉冲神经网络以其低功耗、事件驱动计算和固有的时间动态特性而闻名,正逐渐成为处理事件传感器产生的动态异步信号的有前景的解决方案。尽管潜力巨大,但脉冲神经网络在训练和架构设计上面临挑战,导致其在具有挑战性的事件密集预测任务中的性能相较于人工神经网络仍显不足。在本研究中,我们开发了一种高效的脉冲编码器-解码器网络,用于大规模事件语义分割任务。为了提升从动态事件流中学习的效率,我们利用了自适应阈值机制,该机制提高了网络在流式推理中的准确性、稀疏性和鲁棒性。此外,我们开发了一种双路径脉冲空间自适应调制模块,该模块专门设计用于增强稀疏事件和多模态输入的表示能力,从而显著提升了网络性能。我们的脉冲编码器-解码器网络在DDD17数据集上取得了72.57%的平均交并比,在更大的DSEC-Semantic数据集上取得了58.32%的平均交并比,在所需计算资源大幅减少的同时,取得了与最先进人工神经网络相竞争的结果。我们的研究结果揭示了脉冲神经网络在事件视觉应用中尚未开发的潜力。源代码将公开提供。