Large Language Models (LLMs) have become indispensable across various domains, but this comes at the cost of substantial computational and memory resources. Model pruning addresses this by removing redundant components from models. In particular, block pruning can achieve significant compression and inference acceleration. However, existing block pruning methods are often unstable and struggle to attain globally optimal solutions. In this paper, we propose a mutual information based pruning method MI-PRUN for LLMs. Specifically, we leverages mutual information to identify redundant blocks by evaluating transitions in hidden states. Additionally, we incorporate the Data Processing Inequality (DPI) to reveal the relationship between the importance of entire contiguous blocks and that of individual blocks. Moreover, we develop the Fast-Block-Select algorithm, which iteratively updates block combinations to achieve a globally optimal solution while significantly improving the efficiency. Extensive experiments across various models and datasets demonstrate the stability and effectiveness of our method.
翻译:大语言模型(LLMs)已成为各领域不可或缺的工具,但其代价是巨大的计算与内存资源消耗。模型剪枝通过移除模型中的冗余组件来解决这一问题。其中,块剪枝能够实现显著的模型压缩与推理加速。然而,现有的块剪枝方法往往不稳定,且难以获得全局最优解。本文提出一种基于互信息的LLM剪枝方法MI-PRUN。具体而言,我们利用互信息评估隐藏状态间的转移,以识别冗余块。此外,我们引入数据处理不等式(DPI)来揭示连续整体块的重要性与单个块重要性之间的关系。进一步,我们开发了Fast-Block-Select算法,该算法通过迭代更新块组合来获得全局最优解,同时显著提升了剪枝效率。在不同模型与数据集上的大量实验证明了我们方法的稳定性与有效性。