Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.
翻译:持续预训练是一种范式,其中预训练语言模型从不断增长的数据中持续获取新知识并逐步升级。在升级版预训练语言模型发布之前,我们可能已针对各种任务对原始模型进行了调优,并存储了适配后的权重。然而,当对升级版模型进行调优时,这些过时的适配权重通常会被忽略和丢弃,导致潜在的资源浪费。我们将这一问题置于前沿,并主张应开发适当的算法来回收过时的适配权重。为此,我们提出了面向持续预训练的可回收调优任务。在初步研究中,我们发现持续预训练后,升级版模型在一定程度上仍与过时的适配权重兼容。受此发现启发,我们从两个新颖角度(即模式连通性和功能相似性)分析了持续预训练模型之间的联系。基于相应发现,我们提出了基于初始化和基于蒸馏的两种方法来解决该任务。我们证明了它们在提升升级版模型调优的收敛速度与性能方面的可行性,并展示了两种方法可结合使用以获得更优效果。源代码已在 https://github.com/thunlp/RecyclableTuning 公开。