Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations being more likely. However, research has shown that this assumption does not always hold, and decoding strategies which directly optimize a utility function, like Minimum Bayes Risk (MBR) or Quality-Aware decoding can significantly improve translation quality over standard MAP decoding. The main disadvantage of these methods is that they require an additional model to predict the utility, and additional steps during decoding, which makes the entire process computationally demanding. In this paper, we propose to make the NMT models themselves quality-aware by training them to estimate the quality of their own output. During decoding, we can use the model's own quality estimates to guide the generation process and produce the highest-quality translations possible. We demonstrate that the model can self-evaluate its own output during translation, eliminating the need for a separate quality estimation model. Moreover, we show that using this quality signal as a prompt during MAP decoding can significantly improve translation quality. When using the internal quality estimate to prune the hypothesis space during MBR decoding, we can not only further improve translation quality, but also reduce inference speed by two orders of magnitude.
翻译:最大后验概率(MAP)解码是神经机器翻译(NMT)模型最广泛使用的解码策略。其隐含假设是模型概率与人类判断高度相关,且更好的翻译具有更高的概率。然而,研究表明该假设并非总是成立,而直接优化效用函数的解码策略(如最小贝叶斯风险(MBR)或质量感知解码)能够显著提升翻译质量,胜过标准MAP解码。这类方法的主要缺陷在于需要额外模型预测效用,并在解码过程中增加额外步骤,导致整体计算开销巨大。本文提出通过训练NMT模型估算自身输出质量,使其本身具备质量感知能力。在解码阶段,我们可利用模型自身的质量估计引导生成过程,产出最高质量的翻译。实验证明,模型能在翻译过程中自我评估输出质量,从而消除对独立质量评估模型的需求。更重要的是,在MAP解码中将此质量信号作为提示能够显著提升翻译质量。当利用内部质量估计在MBR解码中修剪假设空间时,我们不仅能进一步提升翻译质量,还能将推理速度提升两个数量级。