Spiking Neural Networks (SNNs) with their bio-inspired Leaky Integrate-and-Fire (LIF) neurons inherently capture temporal information. This makes them well-suited for sequential tasks like processing event-based data from Dynamic Vision Sensors (DVS) and event-based speech tasks. Harnessing the temporal capabilities of SNNs requires mitigating vanishing spikes during training, capturing spatio-temporal patterns and enhancing precise spike timing. To address these challenges, we propose TSkips, augmenting SNN architectures with forward and backward skip connections that incorporate explicit temporal delays. These connections capture long-term spatio-temporal dependencies and facilitate better spike flow over long sequences. The introduction of TSkips creates a vast search space of possible configurations, encompassing skip positions and time delay values. To efficiently navigate this search space, this work leverages training-free Neural Architecture Search (NAS) to identify optimal network structures and corresponding delays. We demonstrate the effectiveness of our approach on four event-based datasets: DSEC-flow for optical flow estimation, DVS128 Gesture for hand gesture recognition and Spiking Heidelberg Digits (SHD) and Spiking Speech Commands (SSC) for speech recognition. Our method achieves significant improvements across these datasets: up to 18% reduction in Average Endpoint Error (AEE) on DSEC-flow, 8% increase in classification accuracy on DVS128 Gesture, and up to 8% and 16% higher classification accuracy on SHD and SSC, respectively.
翻译:具有仿生泄漏积分发放(LIF)神经元的脉冲神经网络(SNNs)能够天然地捕获时序信息。这使得它们非常适合处理序列任务,例如处理来自动态视觉传感器(DVS)的基于事件的数据以及基于事件的语音任务。要充分利用SNNs的时序处理能力,需要缓解训练过程中脉冲消失的问题,捕获时空模式并增强精确的脉冲发放时序。为应对这些挑战,我们提出了TSkips,通过引入包含显式时间延迟的前向与后向跳跃连接来增强SNN架构。这些连接能够捕获长期的时空依赖关系,并促进脉冲在长序列中更好地传播。TSkips的引入创造了一个包含跳跃连接位置和时间延迟值的巨大配置搜索空间。为了高效地探索此搜索空间,本研究利用免训练的神经架构搜索(NAS)来确定最优的网络结构及相应的延迟参数。我们在四个基于事件的数据集上验证了所提方法的有效性:用于光流估计的DSEC-flow、用于手势识别的DVS128 Gesture,以及用于语音识别的Spiking Heidelberg Digits(SHD)和Spiking Speech Commands(SSC)。我们的方法在这些数据集上均取得了显著提升:在DSEC-flow上将平均端点误差(AEE)降低了高达18%,在DVS128 Gesture上将分类准确率提高了8%,在SHD和SSC上分别将分类准确率提升了高达8%和16%。