Spiking Neural Networks (SNN) are a class of bio-inspired neural networks that promise to bring low-power and low-latency inference to edge devices through asynchronous and sparse processing. However, being temporal models, SNNs depend heavily on expressive states to generate predictions on par with classical artificial neural networks (ANNs). These states converge only after long transient periods, and quickly decay without input data, leading to higher latency, power consumption, and lower accuracy. This work addresses this issue by initializing the state with an auxiliary ANN running at a low rate. The SNN then uses the state to generate predictions with high temporal resolution until the next initialization phase. Our hybrid ANN-SNN model thus combines the best of both worlds: It does not suffer from long state transients and state decay thanks to the ANN, and can generate predictions with high temporal resolution, low latency, and low power thanks to the SNN. We show for the task of event-based 2D and 3D human pose estimation that our method consumes 88% less power with only a 4% decrease in performance compared to its fully ANN counterparts when run at the same inference rate. Moreover, when compared to SNNs, our method achieves a 74% lower error. This research thus provides a new understanding of how ANNs and SNNs can be used to maximize their respective benefits.
翻译:脉冲神经网络(SNN)是一类受生物启发的神经网络,通过异步和稀疏处理,有望为边缘设备带来低功耗、低延迟的推理能力。然而,作为时序模型,SNN高度依赖具有表现力的状态才能生成与经典人工神经网络(ANN)相媲美的预测结果。这些状态需经历较长的瞬态期才能收敛,且在没有输入数据时会迅速衰减,从而导致更高的延迟、功耗和更低的精度。本文通过引入一个以低速率运行的辅助ANN来初始化状态,从而解决了这一问题。随后,SNN利用该状态以高时间分辨率生成预测,直到下一次初始化阶段。因此,我们的混合ANN-SNN模型融合了两者的优势:借助ANN,它不会受到长状态瞬态和状态衰减的影响;而通过SNN,它可以实现高时间分辨率、低延迟和低功耗的预测。在基于事件的2D和3D人体姿态估计任务中,我们证明:当以相同推理速率运行时,与纯ANN模型相比,本方法功耗降低88%,性能仅下降4%;而与SNN相比,本方法误差降低74%。因此,本研究为如何利用ANN和SNN最大化各自优势提供了新的理解。