Recently, diffusion models have excelled in image generation tasks and have also been applied to neural language processing (NLP) for controllable text generation. However, the application of diffusion models in a cross-lingual setting is less unexplored. Additionally, while pretraining with diffusion models has been studied within a single language, the potential of cross-lingual pretraining remains understudied. To address these gaps, we propose XDLM, a novel Cross-lingual diffusion model for machine translation, consisting of pretraining and fine-tuning stages. In the pretraining stage, we propose TLDM, a new training objective for mastering the mapping between different languages; in the fine-tuning stage, we build up the translation system based on the pretrained model. We evaluate the result on several machine translation benchmarks and outperformed both diffusion and Transformer baselines.
翻译:近年来,扩散模型在图像生成任务中表现优异,并被应用于自然语言处理领域以实现可控文本生成。然而,扩散模型在跨语言场景中的应用尚未得到充分探索。此外,尽管单语言下的扩散模型预训练已有研究,但跨语言预训练的潜力仍待挖掘。针对上述问题,本文提出XDLM——一种新颖的跨语言扩散机器翻译模型,包含预训练与微调两个阶段。在预训练阶段,我们提出TLDM这一新型训练目标,以掌握不同语言间的映射关系;在微调阶段,我们基于预训练模型构建翻译系统。在多个机器翻译基准上的评估结果表明,本方法在性能上超越了扩散模型与Transformer基线。