The Kyrgyz language, as a low-resource language, requires significant effort to create high-quality syntactic corpora. This study proposes an approach to simplify the development process of a syntactic corpus for Kyrgyz. We present a tool for transferring syntactic annotations from Turkish to Kyrgyz based on a treebank translation method. The effectiveness of the proposed tool was evaluated using the TueCL treebank. The results demonstrate that this approach achieves higher syntactic annotation accuracy compared to a monolingual model trained on the Kyrgyz KTMU treebank. Additionally, the study introduces a method for assessing the complexity of manual annotation for the resulting syntactic trees, contributing to further optimization of the annotation process.
翻译:吉尔吉斯语作为一种低资源语言,构建高质量句法语料库需要付出巨大努力。本研究提出一种简化吉尔吉斯语句法语料库开发流程的方法。我们基于树库翻译方法,开发了一种将土耳其语句法标注迁移至吉尔吉斯语的工具。该工具的有效性通过TueCL树库进行评估。实验结果表明,与在吉尔吉斯语KTMU树库上训练的单语模型相比,本方法能获得更高的句法标注准确率。此外,本研究还提出了一种评估生成句法树人工标注复杂度的量化方法,为标注流程的进一步优化提供了技术支撑。