Large Language Models (LLMs) demonstrate strong capability across multiple tasks, including machine translation. Our study focuses on evaluating Llama2's machine translation capabilities and exploring how translation depends on languages in its training data. Our experiments show that the 7B Llama2 model yields above 10 BLEU score for all languages it has seen, but not always for languages it has not seen. Most gains for those unseen languages are observed the most with the model scale compared to using chat versions or adding shot count. Furthermore, our linguistic distance analysis reveals that syntactic similarity is not always the primary linguistic factor in determining translation quality. Interestingly, we discovered that under specific circumstances, some languages, despite having significantly less training data than English, exhibit strong correlations comparable to English. Our discoveries here give new perspectives for the current landscape of LLMs, raising the possibility that LLMs centered around languages other than English may offer a more effective foundation for a multilingual model.
翻译:大型语言模型(LLMs)在多项任务中展现出强大能力,包括机器翻译。本研究聚焦于评估Llama2的机器翻译能力,并探究翻译如何依赖于训练数据中的语言。实验表明,7B规模的Llama2模型对所有可见语言均能取得高于10的BLEU分数,但对未见语言并非始终如此。与使用聊天版本或增加样本数量相比,模型规模扩大对未见语言的性能提升最为显著。此外,我们的语言距离分析揭示,句法相似性并非决定翻译质量的首要语言因素。有趣的是,我们发现特定情境下,某些语言尽管训练数据远少于英语,却表现出与英语相当的强相关性。这些发现为当前LLM的研究格局提供了新视角,暗示以非英语语言为中心的LLM或许能为多语言模型构建更有效的基础。