Current Grammar Error Correction (GEC) initiatives tend to focus on major languages, with less attention given to low-resource languages like Esperanto. In this article, we begin to bridge this gap by first conducting a comprehensive frequency analysis using the Eo-GP dataset, created explicitly for this purpose. We then introduce the Eo-GEC dataset, derived from authentic user cases and annotated with fine-grained linguistic details for error identification. Leveraging GPT-3.5 and GPT-4, our experiments show that GPT-4 outperforms GPT-3.5 in both automated and human evaluations, highlighting its efficacy in addressing Esperanto's grammatical peculiarities and illustrating the potential of advanced language models to enhance GEC strategies for less commonly studied languages.
翻译:当前的语法错误纠正(GEC)研究主要集中于主流语言,而对世界语等低资源语言的关注较少。本文通过以下工作初步填补这一空白:首先,利用专为此目的构建的Eo-GP数据集,开展了全面的频率分析。随后,我们介绍了基于真实用户案例并标注了细粒度语言细节的Eo-GEC数据集,用于错误识别。借助GPT-3.5和GPT-4模型,实验表明GPT-4在自动评估和人工评估中均优于GPT-3.5,凸显了其在处理世界语语法特性方面的有效性,并展示了先进语言模型在增强低研究度语言GEC策略中的潜力。