In modern educational systems, Automatic Text Scoring (ATS) plays a central role by enabling scalable and consistent evaluation of learner responses without human intervention. Recently, the increased accessibility of LLMs and Arabic-specific datasets has sparked renewed interest in this area. In this work, we investigate LLM-Based approaches for the automated evaluation of Arabic texts, focusing on both short answer grading (ASAG) and essay scoring (AES). We further introduce a structured taxonomy comprising five dimensions: application domain, feedback generation capability, LLM architecture deployed, alignment with competency referential frameworks, and prompt engineering strategy. By applying this taxonomy, we conduct a comparative analysis of existing studies, examining their methodological approaches, datasets, evaluation metrics, and reported performance. The findings highlight the need for sustained and pedagogically grounded research efforts in Arabic ATS, given its significance for improving educational quality across Arabic-speaking communities.
翻译:在现代教育体系中,自动文本评分(ATS)通过无需人工干预即可实现对学习者回答的可扩展且一致的评估,发挥着核心作用。近年来,大语言模型(LLM)以及阿拉伯语特定数据集的可及性提高,重新激发了该领域的兴趣。本文研究了基于大语言模型的阿拉伯语文本自动评估方法,重点关注短答案评分(ASAG)和作文评分(AES)两个方面。我们进一步引入了一个包含五个维度的结构化分类体系:应用领域、反馈生成能力、所采用的大语言模型架构、与能力参考框架的对齐情况以及提示工程策略。通过应用这一分类体系,我们对现有研究进行了比较分析,审视了它们的方法路径、数据集、评估指标以及报告的性能。研究结果凸显了在阿拉伯语自动文本评分领域开展持续且具有教学理论基础的研究努力的迫切需求,因为这对于提升阿拉伯语社区的整体教育质量具有重要意义。