Spiking neural networks (SNNs) have the potential to emerge as the third generation of neural networks and have attracted increasing attention across a wide range of applications. However, the large number of synaptic connections in SNNs leads to intensive weight-update computation by on-chip learning algorithms during training, resulting in substantial hardware resource utilization and energy consumption. Among existing SNN learning algorithms, spike-timing-dependent plasticity (STDP) is one of the most extensively studied and widely adopted, serving as a fundamental learning component in SNNs. To address the hardware and energy overheads associated with SNN training, this paper presents intrinsic-timing power-of-two STDP (ITP-STDP) and its corresponding prototype learning engine hardware architecture. The proposed design is evaluated through a dedicated mean-field synaptic drift model for dynamical analysis and further validated across SNN networks of different scales and datasets. It is further implemented on both ASIC and FPGA platforms and compared with state-of-the-art approaches, including the original STDP and more complex STDP variants. The results demonstrate superior energy efficiency, higher operating speed, and substantially lower hardware resource utilization, as the proposed design eliminates most of the computational overhead of STDP through both algorithmic and hardware-level optimizations. On the FPGA platform, the proposed design improves energy efficiency by 4.5$\times$ to 219.8$\times$ over the compared designs. On the ASIC platform, the proposed design achieves a 4.8$\times$ to 22.01$\times$ speedup while consuming only 1.2% to 3.3% of the area required by prior works.
翻译:脉冲神经网络(SNNs)有望成为第三代神经网络,并在广泛应用领域吸引了越来越多的关注。然而,SNN中大量的突触连接导致训练过程中片上学习算法需要进行密集的权重更新计算,从而带来巨大的硬件资源占用和能耗开销。在现有的SNN学习算法中,脉冲时序依赖可塑性(STDP)是研究最广泛、采用最多的算法之一,作为SNN中的基础学习组件。为解决SNN训练相关的硬件和能耗开销问题,本文提出了内禀时序2的幂次STDP(ITP-STDP)及其相应的原型学习引擎硬件架构。通过专用的平均场突触漂移模型进行动力学分析,并在不同规模和数据集上的SNN网络中得到进一步验证。该设计在ASIC和FPGA平台上实现,并与现有最优方法(包括原始STDP及更复杂的STDP变体)进行比较。结果表明,通过算法与硬件层面的优化,所提设计消除了STDP的大部分计算开销,展现出卓越的能效、更高的工作速度以及显著降低的硬件资源占用。在FPGA平台上,与对比设计相比,所提设计将能效提升4.5倍至219.8倍。在ASIC平台上,所提设计实现4.8倍至22.01倍的加速,而所需面积仅为先前工作的1.2%至3.3%。