The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.
翻译:剪枝目标近期已超越准确率和稀疏度,扩展至语言模型的稳健性。尽管已有研究,现有方法在持续提升模型稀疏度时仍难以增强对抗攻击下的稳健性,且需依赖重训练过程。随着人类步入大语言模型时代,这些问题愈发突出。本文提出语言模型的稳健性与其所蕴含的预训练知识程度成正比。据此,我们引入一种训练后剪枝策略,旨在忠实复现稠密语言模型的嵌入空间与特征空间,以在剪枝过程中保留更多预训练知识。该框架中,每层的重建误差不仅源于自身,还包含来自前序层的累积误差,随后进行自适应校正。与当前最优基线相比,我们的方法在BERT模型上基于数据集SST2、IMDB和AGNews,实现了准确率、稀疏度、稳健性与剪枝成本之间的更优平衡,标志着语言模型稳健剪枝领域的重要进展。