SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the RWKV language model, we successfully implement `SpikeGPT', a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

翻译：随着大型语言模型规模的持续扩展，其运行所需的计算资源也随之增长。脉冲神经网络（SNNs）作为一种能效优化的深度学习方法，通过利用稀疏性和事件驱动的激活机制来降低模型推理的计算开销。尽管在诸多计算机视觉任务中，SNNs已能与非脉冲模型相竞争，但其训练难度显著更高。因此，SNNs的性能仍落后于现代深度学习技术，其在语言生成任务中的有效性尚未得到验证。本文受RWKV语言模型启发，成功实现了“SpikeGPT”——一种采用纯二进制事件驱动脉冲激活单元的生成式语言模型。我们在三种模型变体上对所提模型进行了训练：参数量分别为45M、125M和260M。据我们所知，这是当前规模最大的、基于反向传播训练的SNN模型，其参数量较此前同类模型扩大了4倍。通过修改Transformer模块（将多头自注意力替换为线性复杂度机制），我们实现了随序列长度增加计算复杂度从二次方降至线性。输入令牌按序流式输入至注意力机制（与典型SNNs一致）。初步实验表明，SpikeGPT在测试基准上与非脉冲模型保持竞争力，同时在能够利用稀疏事件驱动激活的神经形态硬件上运行时，其能耗降低至五分之一。我们的代码实现已开源至https://github.com/ridgerchu/SpikeGPT。