Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe outperforms vanilla fine-tuning and enhances generalization of hidden representations from different layers. In addition, HyPe acquires negligible computational overheads, and is better than and compatible with previous state-of-the-art fine-tuning techniques.
翻译:基于Transformer结构的语言模型在自然语言处理中展现出卓越性能。然而,在下游任务中对预训练语言模型进行微调时仍存在过拟合和表示坍缩等问题。本文提出HyPe——一种简单而有效的微调技术,通过扰动Transformer层的隐层表示来缓解上述问题。不同于以往仅对输入或参数添加噪声的工作,我们认为Transformer层的隐层表示承载着更丰富多样的语言信息。因此,增强Transformer层对隐层表示扰动的鲁棒性,能够整体上更有利于预训练语言模型的微调。我们在GLUE及其他自然语言推理数据集上开展了广泛实验与分析。结果表明,HyPe显著优于标准微调方法,并能增强不同网络层隐层表示的泛化能力。此外,HyPe仅带来可忽略的额外计算开销,与现有最优微调技术相比具有优越性且可兼容。