We propose a new paradigm to continually evolve pretrained models, denoted ColD Fusion. It provides the benefits of multitask learning but leverages distributed computation with limited communication and eliminates the need for shared data. Consequentially, ColD Fusion can give rise to a synergistic loop, where finetuned models can be recycled to continually improve the pretrained model they are based upon. We show that ColD Fusion yields comparable benefits to multitask training by producing a model that (a) attains strong performance on all of the datasets it was trained on; and (b) is a better starting point for finetuning on unseen datasets. We show that ColD Fusion outperforms RoBERTa and even previous multitask models. Specifically, when training and testing on 35 diverse datasets, ColD Fusion-based model outperforms RoBERTa by 2.33 points on average without any changes to the architecture.
翻译:我们提出一种名为ColD Fusion的新范式,用于持续演化预训练模型。该方法兼具多任务学习的优势,同时利用分布式计算实现有限通信,并消除了共享数据的需求。因此,ColD Fusion可催生协同循环:微调后的模型可被回收利用,持续改进其基础的预训练模型。实验表明,ColD Fusion能取得与多任务训练相当的性能收益,具体表现为:(a)在训练涉及的所有数据集上均展现出强劲性能;(b)成为面向未见数据集微调时更优的起点。我们证明ColD Fusion超越了RoBERTa及先前多任务模型。具体而言,在35个多样化数据集上进行训练与测试时,基于ColD Fusion的模型在不改变架构的情况下,平均性能比RoBERTa高出2.33个百分点。