Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Large Language Models (LLMs) with a billion or more parameters are prime targets for network pruning, which aims to reduce a portion of the network weights without compromising performance. Prior approaches such as Weights Magnitude, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained large language models. In this paper, we present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner). GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the importance pruning score, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks. Intriguing, after incorporating gradients, the unstructured pruning method tends to reveal some structural patterns post-pruning, which mirrors the geometric interdependence inherent in the LLMs' parameter structure. Additionally, GBLM-Pruner functions without any subsequent retraining or weight updates to maintain its simplicity as other counterparts. Extensive evaluations on LLaMA-1 and LLaMA-2 across various language benchmarks and perplexity show that GBLM-Pruner surpasses magnitude pruning, Wanda (weights+activations) and SparseGPT (weights+activations+weight update) by significant margins. Our code and models are available at https://github.com/RocktimJyotiDas/GBLM-Pruner.

翻译：大语言模型（LLMs）拥有数十亿甚至更多参数，是网络剪枝的主要目标，其旨在在不牺牲性能的前提下减少部分网络权重。先前的方法（如权重幅度、SparseGPT和Wanda）要么仅关注权重，要么将权重与激活结合以实现稀疏性，但忽略了从预训练大语言模型中获取的信息性梯度。本文提出了一种针对预训练LLMs的新型稀疏性中心剪枝方法，称为基于梯度语言模型剪枝器（GBLM-Pruner）。GBLM-Pruner利用泰勒展开的一阶项，通过使用少量校准样本的适当归一化梯度来确定重要性剪枝得分，以无需训练的方式运行，并在多个基准测试中显著优于SparseGPT和Wanda等竞争方法。有趣的是，在引入梯度后，非结构化剪枝方法在剪枝后倾向于展现一些结构性模式，这反映了LLMs参数结构中固有的几何相互依赖性。此外，GBLM-Pruner无需后续重新训练或权重更新即可运行，保持了与其他方法相同的简洁性。在LLaMA-1和LLaMA-2上跨多种语言基准测试和困惑度的广泛评估表明，GBLM-Pruner在显著程度上超越了幅度剪枝、Wanda（权重+激活）和SparseGPT（权重+激活+权重更新）。我们的代码和模型可在https://github.com/RocktimJyotiDas/GBLM-Pruner获取。