Deep neural models (e.g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness. This paper advances the self-attention mechanism to its robust variant for Transformer-based pre-trained language models (e.g. BERT). We propose \textit{Adversarial Self-Attention} mechanism (ASA), which adversarially biases the attentions to effectively suppress the model reliance on features (e.g. specific keywords) and encourage its exploration of broader semantics. We conduct a comprehensive evaluation across a wide range of tasks for both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gains compared to naive training for longer steps. For fine-tuning, ASA-empowered models outweigh naive models by a large margin considering both generalization and robustness.
翻译:深度神经模型(例如Transformer)天然地会学习虚假特征,这些特征在标签与输入之间形成“捷径”,从而损害模型的泛化能力和鲁棒性。本文针对基于Transformer的预训练语言模型(如BERT),将自注意力机制改进为其鲁棒性变体。我们提出了对抗性自注意力机制(ASA),该机制通过对抗方式偏置注意力,有效抑制模型对特征(如特定关键词)的依赖,并鼓励其探索更广泛的语义信息。我们在预训练和微调阶段对多种任务进行了全面评估。在预训练阶段,与简单训练相比,ASA在更长步数上展现出显著的性能提升。在微调阶段,采用ASA的模型在泛化性和鲁棒性方面均大幅优于普通模型。