Recently, diffusion models have excelled in image generation tasks and have also been applied to neural language processing (NLP) for controllable text generation. However, the application of diffusion models in a cross-lingual setting is less unexplored. Additionally, while pretraining with diffusion models has been studied within a single language, the potential of cross-lingual pretraining remains understudied. To address these gaps, we propose XDLM, a novel Cross-lingual diffusion model for machine translation, consisting of pretraining and fine-tuning stages. In the pretraining stage, we propose TLDM, a new training objective for mastering the mapping between different languages; in the fine-tuning stage, we build up the translation system based on the pretrained model. We evaluate the result on several machine translation benchmarks and outperformed both diffusion and Transformer baselines.
翻译:近期,扩散模型在图像生成任务中表现卓越,并已成功应用于神经语言处理领域以实现可控文本生成。然而,扩散模型在跨语言场景中的应用仍鲜有探索。同时,尽管针对单一语言的扩散模型预训练已有研究,但跨语言预训练的潜力尚待深入挖掘。为填补上述空白,本文提出XDLM——一种面向机器翻译的新型跨语言扩散模型,包含预训练与微调阶段。在预训练阶段,我们提出TLDM目标函数,旨在掌握不同语言之间的映射关系;在微调阶段,我们基于预训练模型构建翻译系统。在多个机器翻译基准上的评估结果表明,本方法显著优于扩散模型与Transformer基线。