Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese with much less energy consumption. Our code is available at https://github.com/Lvchangze/SpikeBERT.
翻译:脉冲神经网络(SNNs)为实现更节能的深度神经网络提供了一种有前景的途径。然而,现有用于语言任务的SNN网络架构仍然较为简单且相对浅层,深度架构尚未得到充分探索,导致其与BERT等基于Transformer的主流网络相比存在显著性能差距。为此,我们改进了近期提出的脉冲Transformer(即Spikformer),使其能够处理语言任务,并提出了一种两阶段知识蒸馏训练方法:该方法结合了从BERT蒸馏知识进行预训练(利用大量无标注文本)以及再次通过知识蒸馏在相同训练样本上微调后的BERT进行任务特定实例微调。通过大量实验证明,采用我们方法训练的模型(命名为SpikeBERT)在英文和中文文本分类任务上不仅优于现有最先进的SNN模型,甚至能以更低的能耗达到与BERT相当的性能。我们的代码已开源在https://github.com/Lvchangze/SpikeBERT。