Diffusion models have emerged as the new state-of-the-art family of deep generative models, and their promising potentials for text generation have recently attracted increasing attention. Existing studies mostly adopt a single encoder architecture with partially noising processes for conditional text generation, but its degree of flexibility for conditional modeling is limited. In fact, the encoder-decoder architecture is naturally more flexible for its detachable encoder and decoder modules, which is extensible to multilingual and multimodal generation tasks for conditions and target texts. However, the encoding process of conditional texts lacks the understanding of target texts. To this end, a spiral interaction architecture for encoder-decoder text diffusion (DiffuSIA) is proposed. Concretely, the conditional information from encoder is designed to be captured by the diffusion decoder, while the target information from decoder is designed to be captured by the conditional encoder. These two types of information flow run through multilayer interaction spirally for deep fusion and understanding. DiffuSIA is evaluated on four text generation tasks, including paraphrase, text simplification, question generation, and open-domain dialogue generation. Experimental results show that DiffuSIA achieves competitive performance among previous methods on all four tasks, demonstrating the effectiveness and generalization ability of the proposed method.
翻译:扩散模型已成为深度生成模型领域最新一代的先进技术,其文本生成的巨大潜力近期引起了越来越多的关注。现有研究大多采用带有部分噪声过程(partially noising process)的单编码器架构用于条件文本生成,但该架构在条件建模方面的灵活性有限。实际上,编码器-解码器架构因其可分离的编码器和解码器模块而天然具有更高的灵活性,可扩展至条件文本和目标文本的多语言和多模态生成任务。然而,条件文本的编码过程缺乏对目标文本的理解。为此,本文提出了一种用于编码器-解码器文本扩散的螺旋交互架构(DiffuSIA)。具体而言,来自编码器的条件信息被设计为由扩散解码器捕获,而来自解码器的目标信息则被设计为由条件编码器捕获。这两类信息流通过多层螺旋交互实现深度融合与理解。我们在四个文本生成任务上对DiffuSIA进行了评估,包括复述、文本简化、问题生成和开放域对话生成。实验结果表明,DiffuSIA在所有四个任务中均取得了与先前方法相竞争的性能,展示了所提方法的有效性和泛化能力。