Multilingual natural language processing is getting increased attention, with numerous models, benchmarks, and methods being released for many languages. English is often used in multilingual evaluation to prompt language models (LMs), mainly to overcome the lack of instruction tuning data in other languages. In this position paper, we lay out two roles of English in multilingual LM evaluations: as an interface and as a natural language. We argue that these roles have different goals: task performance versus language understanding. This discrepancy is highlighted with examples from datasets and evaluation setups. Numerous works explicitly use English as an interface to boost task performance. We recommend to move away from this imprecise method and instead focus on furthering language understanding.
翻译:随着多语言自然语言处理日益受到关注,针对多种语言的模型、基准和方法不断涌现。在多语言评估中,英语常被用于提示语言模型,这主要是为了克服其他语言缺乏指令调优数据的问题。在本立场论文中,我们阐述了英语在多语言语言模型评估中的两种角色:作为接口和作为自然语言。我们认为这两种角色具有不同的目标:任务性能与语言理解。我们通过数据集和评估设置中的实例突显了这种差异。许多研究明确将英语作为接口以提升任务性能。我们建议摒弃这种不精确的方法,转而致力于深化语言理解。