Like many other machine learning applications, neural machine translation (NMT) benefits from over-parameterized deep neural models. However, these models have been observed to be brittle: NMT model predictions are sensitive to small input changes and can show significant variation across re-training or incremental model updates. This work studies a frequently used method in NMT, pseudo-label training (PLT), which is common to the related techniques of forward-translation (or self-training) and sequence-level knowledge distillation. While the effect of PLT on quality is well-documented, we highlight a lesser-known effect: PLT can enhance a model's stability to model updates and input perturbations, a set of properties we call model inertia. We study inertia effects under different training settings and we identify distribution simplification as a mechanism behind the observed results.
翻译:与许多其他机器学习应用相同,神经机器翻译(NMT)也受益于过参数化的深度神经模型。然而,这些模型已被观察到具有脆弱性:NMT模型预测对微小输入变化敏感,且在重新训练或增量式模型更新时可能表现出显著差异。本研究聚焦于NMT中一种常用方法——伪标签训练(PLT),该方法与正向翻译(或自训练)和序列级知识蒸馏等关联技术具有共性。尽管PLT对翻译质量的影响已有充分记载,我们强调其一个鲜为人知的作用:PLT能够增强模型对模型更新及输入扰动的稳定性,这一系列特性我们称之为模型惯性。我们研究了不同训练设置下的惯性效应,并识别出分布简化作为观察结果背后的机制。