An emerging research direction in NMT involves the use of Quality Estimation (QE) models, which have demonstrated high correlations with human judgment and can enhance translations through Quality-Aware Decoding. Although several approaches have been proposed based on sampling multiple candidate translations, none have integrated these models directly into the decoding process. In this paper, we address this by proposing a novel token-level QE model capable of reliably scoring partial translations. We build a uni-directional QE model for this, as decoder models are inherently trained and efficient on partial sequences. We then present a decoding strategy that integrates the QE model for Quality-Aware decoding and demonstrate that the translation quality improves when compared to the N-best list re-ranking with state-of-the-art QE models (upto $1.39$ XCOMET-XXL $\uparrow$). Finally, we show that our approach provides significant benefits in document translation tasks, where the quality of N-best lists is typically suboptimal.
翻译:神经机器翻译领域的一个新兴研究方向涉及质量评估模型的使用,这些模型已证明与人工判断具有高度相关性,并能通过质量感知解码提升翻译质量。尽管已有多种基于采样多个候选翻译的方法被提出,但尚未有方法将这些模型直接整合到解码过程中。本文通过提出一种能够可靠评估部分翻译的新型词元级质量评估模型来解决这一问题。为此,我们构建了一个单向质量评估模型,因为解码器模型本身就在部分序列上进行训练且具有高效性。随后,我们提出了一种整合质量评估模型的质量感知解码策略,并证明与使用最先进质量评估模型(最高达 $1.39$ XCOMET-XXL $\uparrow$)的 N-best 列表重排序相比,该方法能提升翻译质量。最后,我们展示了该方法在文档翻译任务中具有显著优势,因为此类任务中 N-best 列表的质量通常欠佳。