Existing works show that augmenting training data of neural networks using both clean and adversarial examples can enhance their generalizability under adversarial attacks. However, this training approach often leads to performance degradation on clean inputs. Additionally, it requires frequent re-training of the entire model to account for new attack types, resulting in significant and costly computations. Such limitations make adversarial training mechanisms less practical, particularly for complex Pre-trained Language Models (PLMs) with millions or even billions of parameters. To overcome these challenges while still harnessing the theoretical benefits of adversarial training, this study combines two concepts: (1) adapters, which enable parameter-efficient fine-tuning, and (2) Mixup, which train NNs via convex combinations of pairs data pairs. Intuitively, we propose to fine-tune PLMs through convex combinations of non-data pairs of fine-tuned adapters, one trained with clean and another trained with adversarial examples. Our experiments show that the proposed method achieves the best trade-off between training efficiency and predictive performance, both with and without attacks compared to other baselines on a variety of downstream tasks.
翻译:现有研究表明,使用干净样本和对抗样本共同扩充神经网络的训练数据,能够提升其在对抗攻击下的泛化能力。然而,这种训练方法往往会导致在干净输入上的性能下降。此外,为应对新型攻击类型,该方法需要频繁重新训练整个模型,产生巨大且昂贵的计算开销。这些局限性使得对抗训练机制在实践中难以落地,尤其对于拥有数百万甚至数十亿参数的复杂预训练语言模型而言。为克服这些挑战,同时保留对抗训练的理论优势,本研究融合了两个概念:(1)适配器,可实现参数高效微调;(2)Mixup,通过数据对的凸组合训练神经网络。直观而言,我们提出通过微调后适配器的非数据对凸组合来微调预训练语言模型,其中一对适配器分别使用干净样本和对抗样本进行训练。实验表明,在多种下游任务中,与各类基线方法相比,所提方法在训练效率与预测性能(无论是否遭受攻击)之间实现了最佳权衡。