Pruning large language models (LLMs) is a challenging task due to their enormous size. The primary difficulty is fine-tuning the model after pruning, which is needed to recover the lost performance caused by dropping weights. Recent approaches have either ignored fine-tuning entirely, focusing on efficient pruning criteria, or attempted layer-wise weight updates, preserving the behavior of each layer. However, even layer-wise weight updates can be costly for LLMs, and previous works have resorted to various approximations. In our paper, we propose a fast and effective weight update algorithm for pruned layers based on the Alternating Direction Method of Multipliers (ADMM). We further extend it with a simple gradual pruning mask selection and achieve state-of-the-art pruning performance across a wide range of LLMs. Code is available at https://github.com/fmfi-compbio/admm-pruning.
翻译:大语言模型(LLM)因其庞大的规模,剪枝任务极具挑战性。主要难点在于剪枝后需要进行模型微调,以恢复因权重删减而损失的性能。现有方法要么完全忽略微调过程而专注于设计高效的剪枝准则,要么尝试逐层权重更新以保持各层的行为特性。然而,即使采用逐层权重更新,对于大语言模型而言计算成本依然高昂,先前的研究不得不借助各种近似方法。本文提出一种基于交替方向乘子法(ADMM)的快速高效剪枝层权重更新算法。我们进一步结合一种简单的渐进式剪枝掩码选择策略进行扩展,在多种大语言模型上实现了当前最优的剪枝性能。代码发布于 https://github.com/fmfi-compbio/admm-pruning。