Quality estimation models have been developed to assess the corrections made by grammatical error correction (GEC) models when the reference or gold-standard corrections are not available. An ideal quality estimator can be utilized to combine the outputs of multiple GEC systems by choosing the best subset of edits from the union of all edits proposed by the GEC base systems. However, we found that existing GEC quality estimation models are not good enough in differentiating good corrections from bad ones, resulting in a low F0.5 score when used for system combination. In this paper, we propose GRECO, a new state-of-the-art quality estimation model that gives a better estimate of the quality of a corrected sentence, as indicated by having a higher correlation to the F0.5 score of a corrected sentence. It results in a combined GEC system with a higher F0.5 score. We also propose three methods for utilizing GEC quality estimation models for system combination with varying generality: model-agnostic, model-agnostic with voting bias, and model-dependent method. The combined GEC system outperforms the state of the art on the CoNLL-2014 test set and the BEA-2019 test set, achieving the highest F0.5 scores published to date.
翻译:质量评估模型旨在当无参考或黄金标准校正时,评估语法错误校正(GEC)模型所做出的修改。理想的质量评估器可通过从所有GEC基系统建议的编辑并集中选择最佳编辑子集,实现多GEC系统输出的组合。然而,我们发现现有GEC质量评估模型在区分优劣校正方面表现不足,用于系统组合时导致F0.5分数偏低。本文提出GRECO——一种新型先进质量评估模型,其与校正后句子F0.5分数的相关性更高,能更准确评估校正句质量,并由此构建出具有更高F0.5分数的组合GEC系统。我们还提出三种利用GEC质量评估模型进行系统组合的方法,其通用性各异:模型无关法、带投票偏见的模型无关法及模型依赖法。该组合GEC系统在CoNLL-2014测试集和BEA-2019测试集上均超越现有先进水平,取得迄今公开的最高F0.5分数。