Large language Models (LLMs), though growing exceedingly powerful, comprises of orders of magnitude less neurons and synapses than the human brain. However, it requires significantly more power/energy to operate. In this work, we propose a novel bio-inspired spiking language model (LM) which aims to reduce the computational cost of conventional LMs by drawing motivation from the synaptic information flow in the brain. In this paper, we demonstrate a framework that leverages the average spiking rate of neurons at equilibrium to train a neuromorphic spiking LM using implicit differentiation technique, thereby overcoming the non-differentiability problem of spiking neural network (SNN) based algorithms without using any type of surrogate gradient. The steady-state convergence of the spiking neurons also allows us to design a spiking attention mechanism, which is critical in developing a scalable spiking LM. Moreover, the convergence of average spiking rate of neurons at equilibrium is utilized to develop a novel ANN-SNN knowledge distillation based technique wherein we use a pre-trained BERT model as "teacher" to train our "student" spiking architecture. While the primary architecture proposed in this paper is motivated by BERT, the technique can be potentially extended to different kinds of LLMs. Our work is the first one to demonstrate the performance of an operational spiking LM architecture on multiple different tasks in the GLUE benchmark.
翻译:大型语言模型(LLMs)虽然日益强大,但其神经元和突触数量比人脑少数个数量级,然而运行所需的功率/能量却显著更高。本研究提出一种新颖的仿生脉冲语言模型(LM),通过借鉴大脑中的突触信息流机制,旨在降低传统语言模型的计算成本。本文展示了一个框架,该框架利用平衡状态下神经元的平均脉冲发放率,采用隐式微分技术训练神经形态脉冲语言模型,从而在不使用任何替代梯度的情况下克服了基于脉冲神经网络(SNN)算法的不可微问题。脉冲神经元的稳态收敛性还使我们能够设计一种脉冲注意力机制,这对开发可扩展的脉冲语言模型至关重要。此外,我们利用平衡状态下神经元平均脉冲发放率的收敛性,提出了一种基于ANN-SNN知识蒸馏的新技术——以预训练的BERT模型作为“教师”,训练我们的“学生”脉冲架构。虽然本文提出的主要架构源于BERT,但该技术可潜在扩展至不同类型的LLMs。本研究首次展示了可运行的脉冲语言模型架构在GLUE基准测试中多个不同任务上的性能表现。