Pre-trained seq2seq models have achieved state-of-the-art results in the grammatical error correction task. However, these models still suffer from a prediction bias due to their unidirectional decoding. Thus, we propose a bidirectional Transformer reranker (BTR), that re-estimates the probability of each candidate sentence generated by the pre-trained seq2seq model. The BTR preserves the seq2seq-style Transformer architecture but utilizes a BERT-style self-attention mechanism in the decoder to compute the probability of each target token by using masked language modeling to capture bidirectional representations from the target context. For guiding the reranking, the BTR adopts negative sampling in the objective function to minimize the unlikelihood. During inference, the BTR gives final results after comparing the reranked top-1 results with the original ones by an acceptance threshold. Experimental results show that, in reranking candidates from a pre-trained seq2seq model, T5-base, the BTR on top of T5-base could yield 65.47 and 71.27 F0.5 scores on the CoNLL-14 and BEA test sets, respectively, and yield 59.52 GLEU score on the JFLEG corpus, with improvements of 0.36, 0.76 and 0.48 points compared with the original T5-base. Furthermore, when reranking candidates from T5-large, the BTR on top of T5-base improved the original T5-large by 0.26 points on the BEA test set.
翻译:预训练的序列到序列模型在语法错误纠正任务中取得了最先进的结果。然而,这些模型由于单向解码仍然存在预测偏差。为此,我们提出了一种双向Transformer重排序器(BTR),该重排序器重新估计由预训练序列到序列模型生成的每个候选句子的概率。BTR保留了序列到序列风格的Transformer架构,但在解码器中采用BERT风格的自注意力机制,利用掩码语言建模从目标上下文中捕获双向表示,从而计算每个目标词元的概率。为了指导重排序过程,BTR在目标函数中采用负采样以最小化非似然性。在推理阶段,BTR通过接受阈值比较重排序后的前1候选结果与原始结果,并给出最终答案。实验结果表明,在对预训练序列到序列模型T5-base生成的候选结果进行重排序时,基于T5-base的BTR在CoNLL-14和BEA测试集上分别达到65.47和71.27的F0.5分数,在JFLEG语料库上达到59.52的GLEU分数,相比原始T5-base分别提升了0.36、0.76和0.48个百分点。此外,当对T5-large生成的候选结果进行重排序时,基于T5-base的BTR在BEA测试集上相比原始T5-large提升了0.26个百分点。