Beyond the success story of adversarial training (AT) in the recent text domain on top of pre-trained language models (PLMs), our empirical study showcases the inconsistent gains from AT on some tasks, e.g. commonsense reasoning, named entity recognition. This paper investigates AT from the perspective of the contextualized language representation outputted by PLM encoders. We find the current AT attacks lean to generate sub-optimal adversarial examples that can fool the decoder part but have a minor effect on the encoder. However, we find it necessary to effectively deviate the latter one to allow AT to gain. Based on the observation, we propose simple yet effective \textit{Contextualized representation-Adversarial Training} (CreAT), in which the attack is explicitly optimized to deviate the contextualized representation of the encoder. It allows a global optimization of adversarial examples that can fool the entire model. We also find CreAT gives rise to a better direction to optimize the adversarial examples, to let them less sensitive to hyperparameters. Compared to AT, CreAT produces consistent performance gains on a wider range of tasks and is proven to be more effective for language pre-training where only the encoder part is kept for downstream tasks. We achieve the new state-of-the-art performances on a series of challenging benchmarks, e.g. AdvGLUE (59.1 $ \rightarrow $ 61.1), HellaSWAG (93.0 $ \rightarrow $ 94.9), ANLI (68.1 $ \rightarrow $ 69.3).
翻译:超越对抗训练(AT)在近期文本领域中基于预训练语言模型(PLM)的成功案例,我们的实证研究表明AT在某些任务上(例如常识推理、命名实体识别)存在不一致的性能提升。本文从PLM编码器输出的上下文化语言表示角度探究对抗训练。我们发现当前的对抗攻击倾向于生成次优的对抗样本,这些样本能够欺骗解码器部分,但对编码器影响较小。然而,我们认为有效扰乱编码器对于AT获得性能增益是必要的。基于这一观察,我们提出简单而有效的上下文化表示对抗训练(CreAT),其中攻击被明确优化以扰乱编码器的上下文化表示。这使得对抗样本能够全局优化,从而欺骗整个模型。我们还发现CreAT提供了更好的对抗样本优化方向,使其对超参数更不敏感。与AT相比,CreAT在更广泛的任务上产生一致性的性能提升,并证明对于仅保留编码器用于下游任务的预训练语言模型更为有效。我们在多个具有挑战性的基准测试中达到了新的最先进水平,例如AdvGLUE(59.1 → 61.1)、HellaSWAG(93.0 → 94.9)、ANLI(68.1 → 69.3)。