In this paper, we carry out experimental research on Grammatical Error Correction, delving into the nuances of single-model systems, comparing the efficiency of ensembling and ranking methods, and exploring the application of large language models to GEC as single-model systems, as parts of ensembles, and as ranking methods. We set new state-of-the-art performance with F_0.5 scores of 72.8 on CoNLL-2014-test and 81.4 on BEA-test, respectively. To support further advancements in GEC and ensure the reproducibility of our research, we make our code, trained models, and systems' outputs publicly available.
翻译:在本文中,我们对语法错误修正进行了实验研究,深入探讨了单一模型系统的细微差别,比较了集成方法和排序方法的效率,并探索了将大型语言模型应用于GEC的多种途径——既作为单一模型系统、集成系统的组成部分,也作为排序方法。我们分别在CoNLL-2014-test和BEA-test上以F_0.5分数72.8和81.4取得了新的最佳性能。为支持GEC领域的进一步研究并确保本研究的可复现性,我们公开了代码、训练模型及系统输出。