Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on. However, the impact of factors beyond training data size on translation performance remains a topic of debate, especially concerning languages not directly encountered during training. Our study delves into Llama2's translation capabilities. By modeling a linear relationship between linguistic feature distances and machine translation scores, we ask ourselves if there are potentially better central languages for LLMs other than English. Our experiments show that the 7B Llama2 model yields above 10 BLEU when translating into all languages it has seen, which rarely happens for languages it has not seen. Most translation improvements into unseen languages come from scaling up the model size rather than instruction tuning or increasing shot count. Furthermore, our correlation analysis reveals that syntactic similarity is not the only linguistic factor that strongly correlates with machine translation scores. Interestingly, we discovered that under specific circumstances, some languages (e.g. Swedish, Catalan), despite having significantly less training data, exhibit comparable correlation levels to English. These insights challenge the prevailing landscape of LLMs, suggesting that models centered around languages other than English could provide a more efficient foundation for multilingual applications.
翻译:大语言模型(LLMs)在其训练过的语言上展现出强大的机器翻译能力。然而,除训练数据规模外,影响翻译性能的其他因素仍存争议,尤其针对训练中未直接接触的语言。本研究深入探究了Llama2的翻译能力。通过建立语言特征距离与机器翻译分数之间的线性关系模型,我们提出疑问:是否存在比英语更适合作为LLMs核心语言的语言?实验表明,7B参数的Llama2模型在翻译其见过的所有语言时,BLEU值均超过10;而对于未见过的语言,这一情况极少发生。大多数面向未见语言的翻译提升主要源于模型规模扩大,而非指令微调或增加样本数量。进一步的相关性分析揭示,句法相似性并非唯一与机器翻译分数强相关的语言因素。有趣的是,我们发现特定条件下,某些语言(如瑞典语、加泰罗尼亚语)尽管训练数据显著较少,却展现出与英语相当的相关性水平。这些发现挑战了当前LLMs的主流范式,表明以英语以外语言为核心的模型或能为多语言应用提供更高效的基础。