In multi-domain learning, a single model is trained on diverse data domains to leverage shared knowledge and improve generalization. The order in which the data from these domains is used for training can significantly affect the model's performance on each domain. However, this dependence is under-studied. In this paper, we investigate the influence of training order (or data mixing) in multi-domain learning using the concept of Lie bracket of gradient vector fields. By analyzing the infinitesimal effects of changing the training order, we identify regions in the parameter space where altering the order between two training domains can benefit the target loss. We validate the predictions of our theoretical framework on the influence of training order (or data mixing) both on a toy example and bilingual LLM pre-training.
翻译:在多域学习中,单个模型通过在不同数据域上进行训练以利用共享知识并提升泛化能力。这些域的数据在训练过程中被使用的顺序会显著影响模型在各个域上的性能,然而这种依赖性尚未得到充分研究。本文基于梯度向量场的李括号概念,探究了多域学习中训练顺序(或数据混合方式)的影响。通过分析改变训练顺序的无穷小效应,我们在参数空间中识别出调整两个训练域之间的顺序能够优化目标损失的特定区域。我们通过一个玩具示例和双语大语言模型预训练实验,验证了所提出的理论框架关于训练顺序(或数据混合)影响预测的有效性。