Grammatical Error Correction (GEC) is the task of correcting errorful sentences into grammatically correct, semantically consistent, and coherent sentences. Popular GEC models either use large-scale synthetic corpora or use a large number of human-designed rules. The former is costly to train, while the latter requires quite a lot of human expertise. In recent years, AMR, a semantic representation framework, has been widely used by many natural language tasks due to its completeness and flexibility. A non-negligible concern is that AMRs of grammatically incorrect sentences may not be exactly reliable. In this paper, we propose the AMR-GEC, a seq-to-seq model that incorporates denoised AMR as additional knowledge. Specifically, We design a semantic aggregated GEC model and explore denoising methods to get AMRs more reliable. Experiments on the BEA-2019 shared task and the CoNLL-2014 shared task have shown that AMR-GEC performs comparably to a set of strong baselines with a large number of synthetic data. Compared with the T5 model with synthetic data, AMR-GEC can reduce the training time by 32\% while inference time is comparable. To the best of our knowledge, we are the first to incorporate AMR for grammatical error correction.
翻译:语法纠错(Grammatical Error Correction, GEC)的任务是将含有错误的句子修正为语法正确、语义一致且连贯的句子。主流的GEC模型要么依赖大规模合成语料,要么使用大量人工设计的规则。前者训练成本高昂,而后者则需要较多人类专业知识。近年来,AMR(抽象语义表示)作为一种语义表示框架,因其完整性和灵活性被广泛应用于诸多自然语言任务中。一个不可忽视的问题是,语法错误句子的AMR可能并不完全可靠。本文提出了AMR-GEC,一种将去噪后的AMR作为附加知识的序列到序列(seq-to-seq)模型。具体而言,我们设计了一种语义聚合的GEC模型,并探索了去噪方法以提高AMR的可靠性。在BEA-2019共享任务和CoNLL-2014共享任务上的实验表明,AMR-GEC与使用大量合成数据的强基线模型性能相当。与使用合成数据的T5模型相比,AMR-GEC可将训练时间减少32%,同时推理时间相当。据我们所知,这是首次将AMR应用于语法纠错任务。