Post-training pruning is an effective approach for reducing the size and inference cost of large language models (LLMs), but existing methods often face a trade-off between pruning quality and computational efficiency. Heuristic pruning methods are efficient but sensitive to activation outliers, while reconstruction-based approaches improve fidelity at the cost of heavy computation. In this work, we propose a lightweight post-training pruning framework based on first-order statistical properties of model weights and activations. During pruning, channel-wise statistics are used to calibrate magnitude-based importance scores, reducing bias from activation-dominated channels. After pruning, we apply an analytic energy compensation to correct distributional distortions caused by weight removal. Both steps operate without retraining, gradients, or second-order information. Experiments across multiple LLM families, sparsity patterns, and evaluation tasks show that the proposed approach improves pruning performance while maintaining computational cost comparable to heuristic methods. The results suggest that simple statistical corrections can be effective for post-training pruning of LLMs.
翻译:训练后剪枝是减小大语言模型(LLMs)规模并降低其推理成本的有效方法,但现有技术往往在剪枝质量与计算效率之间存在权衡。启发式剪枝方法效率高但对激活异常值敏感,而基于重构的方法虽能提升保真度却需付出沉重的计算代价。本研究提出一种基于模型权重与激活一阶统计特性的轻量级训练后剪枝框架。在剪枝过程中,利用通道级统计数据对基于幅值的重要性分数进行校准,从而减少由激活主导通道引起的偏差。剪枝后,我们采用解析能量补偿来校正因权重移除导致的分布畸变。两个步骤均无需重训练、梯度计算或二阶信息。在多种LLM架构、稀疏模式及评估任务上的实验表明,所提方法在保持与启发式方法相当计算成本的同时,提升了剪枝性能。结果表明,简单的统计校正对于LLMs的训练后剪枝具有显著效果。