Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities and have excelled in a variety of natural language processing (NLP)tasks in recent years. Despite the fact that most LLMs are trained predominantly in English, multiple studies have demonstrated their comparative performance in many other languages. However, fundamental questions persist regarding how LLMs acquire their multi-lingual abilities and how performance varies across different languages. These inquiries are crucial for the study of LLMs since users and researchers often come from diverse language backgrounds, potentially influencing their utilization and interpretation of LLMs' results. In this work, we propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings. We investigate the phenomenon of across-language generalizations in LLMs, wherein insufficient multi-lingual training data leads to advanced multi-lingual capabilities. To accomplish this, we employ a novel back-translation-based prompting method. The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
翻译:大型语言模型(LLMs)近年来展现出卓越的自然语言理解能力,并在各类自然语言处理(NLP)任务中表现出色。尽管大多数LLMs主要基于英语进行训练,但多项研究已证明它们在多种其他语言中同样具有可比性能。然而,关于LLMs如何获得多语言能力以及不同语言之间的性能差异如何产生等根本性问题仍然存在。这些问题对于LLM研究至关重要,因为用户和研究者往往来自不同的语言背景,这可能会影响他们对LLM结果的使用和解读。在本研究中,我们提出了一种系统性的方法来衡量LLMs在多语言环境下的性能差异。我们探究了LLMs中跨语言泛化的现象,即多语言训练数据不足反而催生了高级多语言能力。为此,我们采用了一种基于反向翻译的新型提示方法。结果表明,GPT在多语言环境中表现出高度类似于翻译的行为特征。