Spiking Neural Networks (SNNs) offer promising energy-efficient alternatives to large language models (LLMs) due to their event-driven nature and ultra-low power consumption. However, to preserve capacity, most existing spiking LLMs still incur intensive floating-point matrix multiplication (MatMul) and nonlinearities, or training difficulties arising from the complex spatiotemporal dynamics. To address these challenges, we propose BiSpikCLM, the first fully binary spiking MatMul-free causal language model. BiSpikCLM introduces Softmax-Free Spiking Attention (SFSA), eliminating softmax and floating-point operations in autoregressive language modeling. For efficient training, we introduce Spike-Aware Alignment Distillation (SpAD), which aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits. SpAD framework allows BiSpikCLM to reach comparable performance to ANN counterparts using substantially fewer training tokens (e.g., only 5.6% of the tokens for the 1.3B model). As a result, BiSpikCLM achieves competitive performance at only 4.16% - 5.87% of the computational cost on natural language generation tasks. Our results highlight the feasibility and effectiveness of fully binary spike-driven LLMs and establish the distillation as a promising pathway for brain-inspired spiking NLP.
翻译:脉冲神经网络(SNNs)凭借其事件驱动特性与超低功耗,为大语言模型(LLMs)提供了极具前景的节能替代方案。然而,为保持模型容量,现有大多数脉冲大语言模型仍存在密集的浮点矩阵乘法(MatMul)与非线性运算,或因复杂时空动态引发的训练困难。针对这些挑战,我们提出BiSpikCLM——首个全二值化无脉冲矩阵乘法的因果语言模型。BiSpikCLM引入无Softmax脉冲注意力机制(SFSA),在自回归语言建模中消除softmax与浮点运算。为实现高效训练,我们提出脉冲感知对齐蒸馏(SpAD),在嵌入层、注意力图、中间特征和输出logits层面实现ANN教师模型与SNN学生模型的对齐。SpAD框架使BiSpikCLM在使用显著更少训练令牌(例如1.3B模型仅需5.6%令牌)的情况下,达到与ANN对应模型相当的性能。最终,BiSpikCLM在自然语言生成任务中以仅4.16%-5.87%的计算成本实现竞争性表现。实验结果验证了全二值化脉冲驱动大语言模型的可行性与有效性,并为脑启发式脉冲自然语言处理奠定了蒸馏方法的重要路径。