ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

Fajri Koto,Haonan Li,Sara Shatnawi,Jad Doughman,Abdelrahman Boda Sadallah,Aisha Alraeesi,Khalid Almubarak,Zaid Alyafeai,Neha Sengupta,Shady Shehata,Nizar Habash,Preslav Nakov,Timothy Baldwin

from arxiv, Findings of ACL 2024

The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models. While state-of-the-art models are partially trained on large Arabic texts, evaluating their performance in Arabic remains challenging due to the limited availability of relevant datasets. To bridge this gap, we present \datasetname{}, the first multi-task language understanding benchmark for the Arabic language, sourced from school exams across diverse educational levels in different countries spanning North Africa, the Levant, and the Gulf regions. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA) and is carefully constructed by collaborating with native speakers in the region. Our comprehensive evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models. Notably, BLOOMZ, mT0, LLaMA2, and Falcon struggle to achieve a score of 50%, while even the top-performing Arabic-centric model only achieves a score of 62.3%.

翻译：随着大规模预训练模型的进步，语言模型评估的重点已转向推理和知识密集型任务。尽管最先进的模型部分基于大量阿拉伯语文本进行训练，但由于相关数据集的稀缺，评估其在阿拉伯语上的性能仍然具有挑战性。为弥补这一差距，我们提出了\datasetname{}，这是首个面向阿拉伯语的多任务语言理解基准，其数据来源于北非、黎凡特和海湾地区不同国家、涵盖多个教育阶段的学校考试。我们的数据集包含40项任务和14,575道现代标准阿拉伯语（MSA）多项选择题，并由该地区的母语者协作精心构建。我们对35个模型的全面评估揭示了显著的改进空间，尤其是在最佳开源模型中表现明显。值得注意的是，BLOOMZ、mT0、LLaMA2和Falcon模型均难以达到50%的得分，而即使是表现最佳的阿拉伯语专用模型也仅获得62.3%的得分。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/