In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \textit{language mixture ratios}. Multilingual scaling laws can predict the test loss under different language mixture ratios and can therefore be used to estimate the optimal ratios. However, the current approaches to multilingual scaling laws do not measure the \textit{cross-lingual transfer} effect, resulting in suboptimal mixture ratios. In this paper, we consider multilingual pretraining as a cooperative game in which each language acts as a player that jointly contributes to pretraining, gaining the resulting reduction in test loss as the payoff. Consequently, from the perspective of cooperative game theory, we quantify the cross-lingual transfer from each language by its contribution in the game, and propose a game-theoretic multilingual scaling law called \textit{ShapleyLaw}. Our experiments show that ShapleyLaw outperforms baseline methods in model performance prediction and language mixture optimization.
翻译:在多语言预训练中,预训练模型的测试损失显著受到预训练数据中每种语言比例(即\textit{语言混合比例})的影响。多语言缩放法则能够预测不同语言混合比例下的测试损失,因此可用于估计最优混合比例。然而,当前多语言缩放法则未考虑\textit{跨语言迁移}效应,导致混合比例并非最优。本文将多语言预训练视为一种合作博弈,其中每种语言作为共同参与预训练的参与者,其获得的收益为由此带来的测试损失降低。据此,我们从合作博弈论视角,通过每种语言在博弈中的贡献来量化跨语言迁移,并提出一种基于博弈论的多语言缩放法则——\textit{ShapleyLaw}。实验表明,ShapleyLaw在模型性能预测和语言混合优化方面均优于基线方法。