As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency. Post-training pruning is a promising method that does not require resource-intensive iterative training and only needs a small amount of calibration data to assess the importance of parameters. Previous research has primarily focused on designing advanced pruning methods, while different calibration data's impact on pruning performance still lacks systematical exploration. We fill this blank and surprisingly observe that the effects of calibration data even value more than designing advanced pruning strategies, especially for high sparsity. Our preliminary exploration also discloses that using calibration data similar to the training data can yield better performance. As pre-training data is usually inaccessible for advanced LLMs, we further provide a self-generating calibration data synthesis strategy to construct feasible calibration data. We conduct experiments on the recent strong open-source LLMs (e.g., DCLM, and LLaMA-3), and the results show that the proposed method outperforms commonly used calibration data and can effectively enhance strong pruning methods (e.g., Wanda, OWL).
翻译:随着大语言模型(LLMs)在各个领域广泛应用,模型压缩对于降低成本和提升推理效率变得日益重要。训练后剪枝是一种前景广阔的方法,它不需要资源密集的迭代训练,仅需少量校准数据来评估参数的重要性。先前的研究主要集中于设计先进的剪枝方法,而不同校准数据对剪枝性能的影响仍缺乏系统性探索。我们填补了这一空白,并意外地观察到校准数据的影响甚至比设计先进的剪枝策略更为重要,尤其是在高稀疏度情况下。我们的初步探索还揭示,使用与训练数据相似的校准数据可以获得更好的性能。由于先进大语言模型的预训练数据通常难以获取,我们进一步提出了一种自生成校准数据合成策略来构建可行的校准数据。我们在近期强大的开源大语言模型(例如,DCLM 和 LLaMA-3)上进行了实验,结果表明所提方法优于常用的校准数据,并能有效增强先进的剪枝方法(例如,Wanda、OWL)。