Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a $45\%$ throughput improvement in continual pre-training and saves $38\%$ training time in supervised fine-tuning in practice. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training. Our code is available at https://github.com/OpenDFM/SAT.
翻译:大型语言模型(LLMs)已在多种自然语言处理(NLP)任务中展现出卓越能力,但通常需要进行额外训练,例如持续预训练与监督微调。然而,由于其庞大的参数量,此类训练的成本依然高昂。本文提出利用预训练LLMs中的\emph{稀疏性}来加速这一训练过程。通过观察前向迭代过程中激活神经元的稀疏性,我们发现可通过排除非活跃神经元来实现计算加速。我们通过扩展现有神经元重要性评估指标并引入阶梯式省略率调度器来解决相关挑战。在Llama-2上的实验表明,稀疏性加速训练(SAT)在显著加速训练过程的同时,取得了与标准训练相当或更优的性能。具体而言,SAT在持续预训练中实现了$45\%$的吞吐量提升,在实际监督微调中节省了$38\%$的训练时间。该方法为LLM附加训练提供了一个简洁、硬件无关且易于部署的框架。我们的代码公开于https://github.com/OpenDFM/SAT。