The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.
翻译:剪枝目标近来已超越准确率和稀疏性,扩展到语言模型的鲁棒性。然而现有方法在持续提升模型稀疏性时,难以增强对抗攻击的鲁棒性,且需要重训练过程。随着人类步入大语言模型时代,这些问题愈发显著。本文提出语言模型的鲁棒性与其包含的预训练知识程度成正比。据此,我们引入一种训练后剪枝策略,旨在忠实复制稠密语言模型的嵌入空间和特征空间,从而在剪枝过程中保留更多预训练知识。在此框架下,每层的重构误差不仅源于该层自身,还包含来自前层的累积误差,随后进行自适应修正。与其它最先进基准方法相比,我们的方法在BERT模型及SST2、IMDB、AGNews数据集上,实现了准确率、稀疏性、鲁棒性与剪枝成本之间的更优平衡,标志着向语言模型稳健剪枝迈出了重要一步。