We introduce ArabiGEE, the first comprehensive Arabic grammatical error explanation (GEE) taxonomy grounded in explicit error types. Unlike existing GEE approaches that treat explanation generation as free-form text, ArabiGEE organizes grammatical explanations through a hierarchical structure spanning orthographic, morphological, syntactic, and lexical dimensions. The taxonomy consists of 27 error types, 140 correction types, and 324 associated explanations. We apply ArabiGEE to manually annotate portions of existing Arabic grammatical error correction corpora and demonstrate how structured grammatical explanations can support automatic evaluation of LLMs on Arabic GEE. Our code and data are publicly available.
翻译:本文提出ArabiGEE——首个基于显式错误类型的综合性阿拉伯语语法错误解释(GEE)分类体系。与现有将解释生成视为自由文本的GEE方法不同,ArabiGEE通过覆盖正字法、形态学、句法和词汇四个维度的层级结构组织语法解释。该分类体系包含27种错误类型、140种修正类型及324项关联解释。我们应用ArabiGEE对现有阿拉伯语语法纠错语料库的部分数据进行人工标注,并展示了结构化语法解释如何支持大语言模型在阿拉伯语GEE任务上的自动评估。相关代码与数据已公开发布。