达尔文LM：大语言模型的进化式结构化剪枝 (DarwinLM: Evolutionary Structured Pruning of Large Language Models)

Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective solution by compressing models and directly providing end-to-end speed improvements, regardless of the hardware environment. Meanwhile, different components of the model exhibit varying sensitivities towards pruning, calling for non-uniform model compression. However, a pruning method should not only identify a capable substructure, but also account for post-compression training. To this end, we propose DarwinLM, a method for training-aware structured pruning. DarwinLM builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. To assess the effect of post-training, we incorporate a lightweight, multistep training process within the offspring population, progressively increasing the number of tokens and eliminating poorly performing models in each selection stage. We validate our method through extensive experiments on Llama-2-7B, Llama-3.1-8B and Qwen-2.5-14B-Instruct, achieving state-of-the-art performance for structured pruning. For instance, DarwinLM surpasses ShearedLlama while requiring 5x less training data during post-compression training. Code is at: https://github.com/IST-DASLab/DarwinLM

翻译：大语言模型（LLMs）在各类自然语言处理任务中取得了显著成功。然而，其庞大的计算成本限制了其广泛应用，特别是在实时应用场景中。结构化剪枝提供了一种有效的解决方案，通过压缩模型直接实现端到端的加速效果，且不受硬件环境限制。同时，模型的不同组件对剪枝表现出不同的敏感性，这要求采用非均匀的模型压缩策略。然而，一种剪枝方法不仅需要识别出有效的子结构，还需考虑压缩后的训练过程。为此，我们提出了达尔文LM，一种面向训练感知的结构化剪枝方法。该方法基于进化搜索过程，在每一代中通过突变生成多个子代模型，并选择适应度最高的个体存活。为评估训练后效果，我们在子代种群中引入了轻量级的多步训练流程，逐步增加训练标记数量，并在每个选择阶段淘汰性能不佳的模型。我们在Llama-2-7B、Llama-3.1-8B和Qwen-2.5-14B-Instruct模型上进行了广泛实验验证，在结构化剪枝任务中取得了最先进的性能表现。例如，达尔文LM在压缩后训练阶段仅需ShearedLlama五分之一训练数据的情况下，性能仍超越后者。代码已开源：https://github.com/IST-DASLab/DarwinLM

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日