Spiking neural networks (SNNs) are powerful models of spatiotemporal computation and are well suited for deployment on resource-constrained edge devices and neuromorphic hardware due to their low power consumption. Leveraging attention mechanisms similar to those found in their artificial neural network counterparts, recently emerged spiking transformers have showcased promising performance and efficiency by capitalizing on the binary nature of spiking operations. Recognizing the current lack of dedicated hardware support for spiking transformers, this paper presents the first work on 3D spiking transformer hardware architecture and design methodology. We present an architecture and physical design co-optimization approach tailored specifically for spiking transformers. Through memory-on-logic and logic-on-logic stacking enabled by 3D integration, we demonstrate significant energy and delay improvements compared to conventional 2D CMOS integration.
翻译:脉冲神经网络(SNNs)是强大的时空计算模型,因其低功耗特性,非常适合部署在资源受限的边缘设备和神经形态硬件上。借鉴人工神经网络中类似的注意力机制,近期出现的脉冲Transformer利用脉冲操作的二值特性,展现出优异的性能和效率。鉴于目前缺乏针对脉冲Transformer的专用硬件支持,本文首次提出了三维脉冲Transformer硬件架构与设计方法。我们提出了一种专门针对脉冲Transformer的架构与物理设计协同优化方法。通过三维集成技术实现的内存-逻辑堆叠和逻辑-逻辑堆叠,与传统二维CMOS集成相比,我们展示了显著的能耗和延迟改进。