Large language models (LLMs) have demonstrated strong multilingual capabilities; yet, they are mostly English-centric due to the imbalanced training corpora. Existing works leverage this phenomenon to improve their multilingual performances on NLP tasks. In this work, we extend the evaluation from NLP tasks to real user queries. We find that even though translation into English can help improve the performance of multilingual NLP tasks for English-centric LLMs, it may not be optimal for all scenarios. For culture-related tasks that need deep language understanding, prompting in the native language proves to be more promising since it can capture the nuances related to culture and language. Therefore, we advocate for more efforts towards the development of strong multilingual LLMs instead of just English-centric LLMs.
翻译:大型语言模型展现出强大的多语言能力,但由于训练语料的不平衡性,它们大多以英语为中心。现有研究利用这一现象来提升其在自然语言处理任务中的多语言性能。在本工作中,我们将评估范围从自然语言处理任务扩展到真实用户查询。我们发现,尽管将文本翻译成英语有助于提升以英语为中心的大型语言模型在多语言自然语言处理任务中的表现,但这并非在所有场景下都最为适宜。对于需要深度语言理解的文化相关任务,使用本地语言进行提示更有可能捕捉到与文化及语言相关的细微差别。因此,我们呼吁加大力度开发强大的多语言大型语言模型,而非仅聚焦于以英语为中心的模型。