Multilingual large language models (MLLMs), trained on multilingual balanced data, demonstrate better zero-shot learning performance in non-English languages compared to large language models trained on English-dominant data. However, the disparity in performance between English and non-English languages remains a challenge yet to be fully addressed. A distinctive characteristic of MLLMs is their high-quality translation capabilities, indicating an acquired proficiency in aligning between languages. This study explores how to enhance the zero-shot performance of MLLMs in non-English languages by leveraging their alignment capability between English and non-English languages. To achieve this, we first analyze the behavior of MLLMs when performing translation and reveal that there are large magnitude features that play a critical role in the translation process. Inspired by these findings, we retain the weights associated with operations involving the large magnitude features and prune other weights to force MLLMs to rely on these features for tasks beyond translation. We empirically demonstrate that this pruning strategy can enhance the MLLMs' performance in non-English language.
翻译:在多语言平衡数据上训练的多语言大语言模型,相较于在英语主导数据上训练的大语言模型,在非英语语言上展现出更好的零样本学习性能。然而,英语与非英语语言之间的性能差异仍然是一个尚未完全解决的挑战。MLLMs的一个显著特征是其高质量的翻译能力,这表明其已习得语言间对齐的熟练度。本研究探讨如何利用MLLMs在英语与非英语语言之间的对齐能力,来提升其在非英语语言上的零样本性能。为此,我们首先分析了MLLMs执行翻译任务时的行为,揭示了存在大量级特征在翻译过程中起着关键作用。受这些发现启发,我们保留了与涉及大量级特征的操作相关的权重,并剪枝其他权重,以迫使MLLMs在翻译之外的任务中也依赖这些特征。我们通过实验证明,这种剪枝策略能够提升MLLMs在非英语语言上的性能。