Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
翻译:语法错误纠正(GEC)是自动检测并纠正文本中错误的任务。该任务不仅包括纠正语法错误(如缺失介词和主谓不一致),还涵盖拼写错误等正字法错误以及用词不当等语义错误。过去十年间,该领域取得了显著进展,其部分动力来自一系列五次共享任务评测,这些评测推动了基于规则的方法、统计分类器、统计机器翻译的发展,最终形成了当前占主导地位的神经机器翻译系统。在本综述论文中,我们整合该领域研究于单一文章:首先概述任务相关的语言学挑战,介绍研究者可用的最主流数据集(涵盖英语及其他语种),并系统总结已开发的各种方法与技术,尤其关注人工错误生成技术。随后,我们详述多种评估方法及围绕指标可靠性的争议,特别是主观人工评判的相关问题。最后,我们概述近期进展、未来研究方向及现存挑战。我们希望本综述能为领域新入门或希望了解最新发展的研究者提供全面资源。