TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization

Recent years have seen the ever-increasing importance of pre-trained models and their downstream training in deep learning research and applications. At the same time, the defense for adversarial examples has been mainly investigated in the context of training from random initialization on simple classification tasks. To better exploit the potential of pre-trained models in adversarial robustness, this paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. Existing research has shown that since the robust pre-trained model has already learned a robust feature extractor, the crucial question is how to maintain the robustness in the pre-trained model when learning the downstream task. We study the model-based and data-based approaches for this goal and find that the two common approaches cannot achieve the objective of improving both generalization and adversarial robustness. Thus, we propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework, which consists of two neural networks where one of them keeps the population means and variances of pre-training data in the batch normalization layers. Besides the robust information transfer, TWINS increases the effective learning rate without hurting the training stability since the relationship between a weight norm and its gradient norm in standard batch normalization layer is broken, resulting in a faster escape from the sub-optimal initialization and alleviating the robust overfitting. Finally, TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness. Our code is available at https://github.com/ziquanliu/CVPR2023-TWINS.

翻译：近年来，预训练模型及其下游训练在深度学习研究与应用中日益重要。与此同时，针对对抗样本的防御研究主要集中于在简单分类任务上从随机初始化训练的场景。为更好发掘预训练模型在对抗鲁棒性中的潜力，本文聚焦于将对抗预训练模型微调至各类分类任务。现有研究表明，由于鲁棒预训练模型已学习到鲁棒特征提取器，关键问题在于学习下游任务时如何保持预训练模型中的鲁棒性。我们为此目标研究了基于模型和基于数据的方法，发现这两种常见方法无法同时提升泛化性与对抗鲁棒性。因此，我们提出一种新颖的基于统计的方法——双翼归一化（TWINS）微调框架，该框架包含两个神经网络，其中一个在批归一化层中保留预训练数据的总体均值与方差。除鲁棒信息迁移外，TWINS通过打破标准批归一化层中权重范数与其梯度范数之间的关联，在不损害训练稳定性的前提下提升有效学习率，从而加速逃离次优初始化并缓解鲁棒过拟合。最后，TWINS在多种图像分类数据集上展现出了泛化性与鲁棒性的双重有效性。我们的代码已开源至https://github.com/ziquanliu/CVPR2023-TWINS。