On Robustness of Finetuned Transformer-based NLP Models

from arxiv, 16 pages, 8 figures, To be published in the proceedings of the Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP 2023), Singapore, Long paper

Transformer-based pretrained models like BERT, GPT-2 and T5 have been finetuned for a large number of natural language processing (NLP) tasks, and have been shown to be very effective. However, while finetuning, what changes across layers in these models with respect to pretrained checkpoints is under-studied. Further, how robust are these models to perturbations in input text? Does the robustness vary depending on the NLP task for which the models have been finetuned? While there exists some work on studying the robustness of BERT finetuned for a few NLP tasks, there is no rigorous study that compares this robustness across encoder only, decoder only and encoder-decoder models. In this paper, we characterize changes between pretrained and finetuned language model representations across layers using two metrics: CKA and STIR. Further, we study the robustness of three language models (BERT, GPT-2 and T5) with eight different text perturbations on classification tasks from the General Language Understanding Evaluation (GLUE) benchmark, and generation tasks like summarization, free-form generation and question generation. GPT-2 representations are more robust than BERT and T5 across multiple types of input perturbation. Although models exhibit good robustness broadly, dropping nouns, verbs or changing characters are the most impactful. Overall, this study provides valuable insights into perturbation-specific weaknesses of popular Transformer-based models, which should be kept in mind when passing inputs. We make the code and models publicly available [https://github.com/PavanNeerudu/Robustness-of-Transformers-models].

翻译：像BERT、GPT-2和T5这类基于Transformer的预训练模型，已针对大量自然语言处理任务进行了微调，并被证明非常有效。然而，在微调过程中，这些模型的各层相较于预训练检查点发生了哪些变化，目前研究尚不充分。此外，这些模型对输入文本扰动的鲁棒性如何？这种鲁棒性是否因模型微调所针对的NLP任务而异？虽然已有一些研究探讨了针对少数NLP任务微调后的BERT的鲁棒性，但尚无严谨的研究在仅编码器、仅解码器和编码器-解码器模型之间比较这种鲁棒性。在本文中，我们使用两个指标——CKA和STIR——来刻画预训练与微调语言模型表示在各层之间的变化。进一步，我们研究了三种语言模型（BERT、GPT-2和T5）在通用语言理解评估基准中的分类任务以及摘要生成、自由形式生成和问题生成等生成任务上，面对八种不同文本扰动时的鲁棒性。GPT-2的表示在多种输入扰动类型下比BERT和T5更具鲁棒性。尽管模型总体表现出良好的鲁棒性，但删除名词、动词或改变字符的影响最为显著。总体而言，本研究为流行的基于Transformer模型的扰动特定弱点提供了宝贵见解，在传递输入时应谨记这些弱点。我们已将代码和模型公开发布 [https://github.com/PavanNeerudu/Robustness-of-Transformers-models]。