Multilingual language models have gained significant attention in recent years, enabling the development of applications that meet diverse linguistic contexts. In this paper, we present a comprehensive evaluation of three popular multilingual language models: mBERT, XLM-R, and GPT-3. We assess their performance across a diverse set of languages, with a focus on understanding the impact of resource availability (general and model-specific), language family, script type, and word order on model performance, under two distinct tasks - text classification and text generation. Our findings reveal that while the amount of language-specific pretraining data plays a crucial role in model performance, we also identify other factors such as general resource availability, language family, and script type, as important features. We hope that our study contributes to a deeper understanding of multilingual language models to enhance their performance across languages and linguistic contexts.
翻译:近年来,多语言语言模型备受关注,推动了能够适应多样化语言环境的应用开发。本文对三种主流多语言语言模型——mBERT、XLM-R和GPT-3进行了全面评估。我们在文本分类与文本生成两项不同任务中,针对多种语言评估了这些模型的性能,重点分析了资源可用性(通用资源与模型特定资源)、语系、文字类型及词序对模型表现的影响。研究发现,尽管语言特定的预训练数据量对模型性能起关键作用,但通用资源可用性、语系和文字类型等其他因素也是重要特征。我们期望本研究能促进对多语言语言模型的深入理解,从而提升其跨语言和跨语言环境的表现。