Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s. Multiple datasets were developed, and yearly shared tasks have been running since 2018. However, ADI systems are reported to fail in distinguishing between the micro-dialects of Arabic. We argue that the currently adopted framing of the ADI task as a single-label classification problem is one of the main reasons for that. We highlight the limitation of the incompleteness of the Dialect labels and demonstrate how it impacts the evaluation of ADI systems. A manual error analysis for the predictions of an ADI, performed by 7 native speakers of different Arabic dialects, revealed that $\approx$ 66% of the validated errors are not true errors. Consequently, we propose framing ADI as a multi-label classification task and give recommendations for designing new ADI datasets.
翻译:自动阿拉伯方言识别(ADI)自2010年代初被提出以来已获得广泛关注。多个数据集被开发,自2018年起每年都有共享任务在运行。然而,现有ADI系统在区分阿拉伯语微观方言时仍存在失败案例。我们认为,当前将ADI任务框架化为单标签分类问题是导致此现象的主要原因之一。我们着重指出方言标签不完整性的局限,并论证其如何影响ADI系统的评估。由7名不同阿拉伯方言母语者进行的ADI预测结果人工误差分析表明,约66%的经验证错误并非真实错误。基于此,我们提出将ADI任务重新框架化为多标签分类任务,并为设计新型ADI数据集提出建议。