Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap by enabling Arabic to be modeled as a pluricentric language rather than a monolithic system. This paper presents Aladdin-FTI, our submission to the AMIYA shared task. The proposed system is designed to both generate and translate dialectal Arabic (DA). Specifically, the model supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects, as well as bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. The code and trained model are publicly available.
翻译:阿拉伯语方言因其非标准化和高度变异性,长期以来在自然语言处理研究中处于代表性不足的状态,这为计算建模带来了挑战。该领域的最新进展,例如大型语言模型,为弥合这一差距提供了前景广阔的途径,使得阿拉伯语能够被建模为一个多中心语言而非单一体系。本文介绍了我们为AMIYA共享任务提交的Aladdin-FTI系统。所提出的系统旨在同时生成和翻译阿拉伯语方言。具体而言,该模型支持摩洛哥、埃及、巴勒斯坦、叙利亚和沙特阿拉伯方言的文本生成,以及这些方言与现代标准阿拉伯语、英语之间的双向翻译。代码与训练模型均已公开提供。