On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code

Pre-trained language models (PLMs) have become a prevalent technique in deep learning for code, utilizing a two-stage pre-training and fine-tuning procedure to acquire general knowledge about code and specialize in a variety of downstream tasks. However, the dynamic nature of software codebases poses a challenge to the effectiveness and robustness of PLMs. In particular, world-realistic scenarios potentially lead to significant differences between the distribution of the pre-training and test data, i.e., distribution shift, resulting in a degradation of the PLM's performance on downstream tasks. In this paper, we stress the need for adapting PLMs of code to software data whose distribution changes over time, a crucial problem that has been overlooked in previous works. The motivation of this work is to consider the PLM in a non-stationary environment, where fine-tuning data evolves over time according to a software evolution scenario. Specifically, we design a scenario where the model needs to learn from a stream of programs containing new, unseen APIs over time. We study two widely used PLM architectures, i.e., a GPT2 decoder and a RoBERTa encoder, on two downstream tasks, API call and API usage prediction. We demonstrate that the most commonly used fine-tuning technique from prior work is not robust enough to handle the dynamic nature of APIs, leading to the loss of previously acquired knowledge i.e., catastrophic forgetting. To address these issues, we implement five continual learning approaches, including replay-based and regularization-based methods. Our findings demonstrate that utilizing these straightforward methods effectively mitigates catastrophic forgetting in PLMs across both downstream tasks while achieving comparable or superior performance.

翻译：预训练语言模型（PLMs）已成为代码深度学习中的主流技术，通过两阶段预训练和微调流程获取代码的通用知识并专精于多种下游任务。然而，软件代码库的动态性对PLMs的有效性和鲁棒性构成挑战。特别地，现实场景可能导致预训练数据与测试数据分布之间存在显著差异（即分布偏移），从而削弱PLM在下游任务上的性能。本文强调，需要使代码PLMs适应随时间变化的软件数据分布——这一关键问题在先前研究中被忽视。本研究动机在于将PLM置于非平稳环境中，其中微调数据根据软件演化场景随时间演变。具体而言，我们设计了一种场景：模型需要从包含随时间出现的新型未知API的程序流中持续学习。我们研究了两种广泛使用的PLM架构（GPT2解码器和RoBERTa编码器）在API调用和API使用预测这两个下游任务上的表现。结果表明，先前工作中最常用的微调技术不足以应对API的动态特性，导致先前获取的知识丢失（即灾难性遗忘）。为解决这些问题，我们实现了五种持续学习方法，包括基于回放和基于正则化的方法。实验发现表明，采用这些简便方法能有效缓解PLMs在两个下游任务中的灾难性遗忘，同时达到相当或更优的性能。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日