Construction and educational application of a linguistically grounded dependency treebank for Uyghur

Developing effective educational technologies for low-resource agglutinative languages like Uyghur is often hindered by the mismatch between existing annotation frameworks and specific grammatical structures. To address this challenge, this study introduces the Modern Uyghur Dependency Treebank (MUDT), a linguistically grounded annotation framework specifically designed to capture the agglutinative complexity of Uyghur, including zero copula constructions and fine-grained case marking. Utilizing a hybrid pipeline that combines Large Language Model pre-annotation with rigorous human correction, a high-quality treebank consisting of 3,456 sentences was constructed. Intrinsic structural evaluation reveals that MUDT significantly improves dependency projectivity by reducing the crossing-arc rate from 7.35\% in the Universal Dependencies standard to 0.06\%. Extrinsic parsing experiments using UDPipe and Stanza further demonstrate that models trained on MUDT achieve superior in-domain accuracy and cross-domain generalization compared to UD-based baselines. To validate the practical utility of this computational resource, an AI-assisted grammar tutoring system was developed to translate MUDT-based syntactic analyses into interpretable pedagogical feedback. A controlled experiment involving 35 second-language learners indicated that students receiving syntax-aware feedback achieved significantly higher learning gains compared to those in a control group. These findings establish MUDT as a robust foundation for syntactic analysis and underscore the critical role of linguistically informed natural language processing resources in bridging the gap between computational models and the cognitive needs of second-language learners.

翻译：为维吾尔语等低资源黏着语开发有效的教育技术，常因现有标注框架与特定语法结构不匹配而受阻。为应对这一挑战，本研究引入了现代维吾尔语依存树库（MUDT），这是一个专门设计的、基于语言学的标注框架，旨在捕捉维吾尔语的黏着复杂性，包括零系词结构和细粒度格标记。通过采用结合大语言模型预标注与严格人工校正的混合流程，构建了一个包含3,456个句子的高质量树库。内在结构评估表明，MUDT显著改善了依存投射性，将交叉弧比率从通用依存标注标准的7.35%降至0.06%。使用UDPipe和Stanza进行的外在解析实验进一步证明，与基于通用依存标注的基线模型相比，在MUDT上训练的模型实现了更优的领域内准确性和跨领域泛化能力。为验证该计算资源的实际效用，开发了一个AI辅助语法辅导系统，将基于MUDT的句法分析转化为可解释的教学反馈。一项涉及35名二语学习者的对照实验表明，接受句法感知反馈的学生相比对照组取得了显著更高的学习收益。这些发现确立了MUDT作为句法分析的坚实基础，并强调了基于语言学的自然语言处理资源在弥合计算模型与二语学习者认知需求之间差距的关键作用。