Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.
翻译:近期研究表明,在两个专用语言模型的权重之间进行插值,可以以一种多任务学习无法实现的方式在任务间迁移知识。然而,目前鲜有研究探索对两个以上模型(每个模型具有不同的知识库)进行插值。本文提出了无导数权重空间集成(DFWE),一种面向开放域对话的新型少样本任务迁移方法。我们的框架首先使用预定义的源任务集训练一组多样化的专家语言模型;接着,在目标任务上对每个专家模型进行微调,从多个不同的知识库出发逼近目标任务;最后,利用无梯度优化算法对模型权重进行线性插值,以高效找到最优插值权重。在FETA-Friends数据集上的实验表明,该方法优于标准的预训练-微调范式。