Neural Machine Translation (NMT) models have become successful, but their performance remains poor when translating on new domains with a limited number of data. In this paper, we present a novel approach Epi-Curriculum to address low-resource domain adaptation (DA), which contains a new episodic training framework along with denoised curriculum learning. Our episodic training framework enhances the model's robustness to domain shift by episodically exposing the encoder/decoder to an inexperienced decoder/encoder. The denoised curriculum learning filters the noised data and further improves the model's adaptability by gradually guiding the learning process from easy to more difficult tasks. Experiments on English-German and English-Romanian translation show that: (i) Epi-Curriculum improves both model's robustness and adaptability in seen and unseen domains; (ii) Our episodic training framework enhances the encoder and decoder's robustness to domain shift.
翻译:摘要:神经机器翻译(NMT)模型已取得显著成功,但在数据量有限的新领域上翻译时,其性能仍不尽如人意。本文提出一种名为Epi-Curriculum的新颖方法,用于解决低资源领域适应(DA)问题,该方法包含一个新型的情节式训练框架以及去噪课程学习。我们的情节式训练框架通过情节式地将编码器/解码器暴露于缺乏经验的解码器/编码器,从而增强模型对领域偏移的鲁棒性。去噪课程学习通过逐步引导学习过程从简单任务过渡到困难任务,过滤噪声数据并进一步提高模型的适应性。在英德翻译和英罗曼尼亚语翻译上的实验表明:(i)Epi-Curriculum在已知和未知领域均能提升模型的鲁棒性与适应性;(ii)我们的情节式训练框架增强了编码器与解码器对领域偏移的鲁棒性。