The considerable layer-wise redundancy in large language models (LLMs) has established non-uniform sparsity allocation across layers as the standard pruning approach for efficient compression. Existing layer-wise allocation methods that estimate allocation strategy from local signals such as activation outliers or weight spectra mainly derive from local layer importance, whereas the final post-pruning performance is also influenced by the network's subsequent compensatory capacity. In this paper, we directly characterize this property through controlled perturbation experiments. We make the following empirical findings. First, layers exhibit highly heterogeneous responses to pruning-scale perturbations. In most cases, early layers amplify perturbations, while middle and late layers actively absorb them, with relative L2 drift decreasing monotonically across depth and direction realigning toward the unperturbed hidden-state trajectory. Second, absorption is a large-perturbation phenomenon. Under small perturbations the network exhibits amplification across all layers, and the transition to absorption occurs smoothly as perturbation magnitude grows to pruning scale. This enriches the linearized accumulation theory underlying related works. Building on these findings, we define an absorption coefficient per layer and propose absorption-aware correction, an orthogonal augmentation that improves OWL and AlphaPruning by reducing perplexity by 7.13% and boosting zero-shot accuracy by 1.02% across multiple model families at 70% sparsity.
翻译:大型语言模型(LLMs)中存在的显著层间冗余,使得跨层非均匀稀疏分配成为实现高效压缩的标准剪枝方法。现有基于局部信号(如激活异常值或权重谱)估计分配策略的逐层分配方法,主要源自局部层重要性,而剪枝后的最终性能还受到网络后续补偿能力的影响。本文通过受控扰动实验直接刻画该特性。我们得到以下实证发现:首先,各层对剪枝尺度的扰动表现出高度异质性响应。多数情况下,浅层放大扰动,而中层与深层主动吸收扰动,相对L2漂移沿深度方向单调递减,且方向重新向未扰动的隐藏状态轨迹对齐。其次,吸收行为是大扰动现象。在小扰动下,网络各层均表现为放大效应,且随扰动幅度增长至剪枝规模,向吸收的转变平滑发生。这丰富了相关研究中所依赖的线性化累积理论。基于上述发现,我们定义每层吸收系数,并提出吸收感知校正——一种正交增强方法,在70%稀疏度下,通过将困惑度降低7.13%、零样本准确率提升1.02%,改进了OWL与AlphaPruning算法,适用于多个模型族。