As deep learning (DL) models are increasingly being integrated into our everyday lives, ensuring their safety by making them robust against adversarial attacks has become increasingly critical. DL models have been found to be susceptible to adversarial attacks which can be achieved by introducing small, targeted perturbations to disrupt the input data. Adversarial training has been presented as a mitigation strategy which can result in more robust models. This adversarial robustness comes with additional computational costs required to design adversarial attacks during training. The two objectives -- adversarial robustness and computational efficiency -- then appear to be in conflict of each other. In this work, we explore the effects of two different model compression methods -- structured weight pruning and quantization -- on adversarial robustness. We specifically explore the effects of fine-tuning on compressed models, and present the trade-off between standard fine-tuning and adversarial fine-tuning. Our results show that compression does not inherently lead to loss in model robustness and adversarial fine-tuning of a compressed model can yield large improvement to the robustness performance of models. We present experiments on two benchmark datasets showing that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models, while also improving computational efficiency.
翻译:随着深度学习模型日益融入日常生活,通过提升其对对抗攻击的鲁棒性来确保安全性已变得至关重要。研究表明,深度学习模型易受对抗攻击影响——攻击者通过引入微小且具有针对性的扰动来干扰输入数据。对抗训练作为一种缓解策略,能够生成鲁棒性更强的模型。然而,这种对抗鲁棒性需要额外计算成本来设计训练中的对抗攻击。两个目标——对抗鲁棒性与计算效率——因此呈现出相互矛盾的关系。本研究探索了两种模型压缩方法(结构化权值剪枝与量化)对对抗鲁棒性的影响。我们重点研究了微调对压缩模型的作用,并揭示了标准微调与对抗微调之间的权衡关系。实验结果表明,模型压缩本身不会导致鲁棒性损失,且对压缩模型进行对抗微调可显著提升其鲁棒性能。我们在两个基准数据集上的实验显示,压缩模型的对抗微调能达到与对抗训练模型相当的鲁棒性水平,同时提升计算效率。