Conditional variational autoencoders (CVAEs) have been used recently for diverse response generation, by introducing latent variables to represent the relationship between a dialog context and its potential responses. However, the diversity of the generated responses brought by a CVAE model is limited due to the oversimplified assumption of the isotropic Gaussian prior. We propose, Dior-CVAE, a hierarchical CVAE model with an informative prior produced by a diffusion model. Dior-CVAE derives a series of layer-wise latent variables using attention mechanism and infusing them into decoder layers accordingly. We propose memory dropout in the latent infusion to alleviate posterior collapse. The prior distribution of the latent variables is parameterized by a diffusion model to introduce a multimodal distribution. Overall, experiments on two popular open-domain dialog datasets indicate the advantages of our approach over previous Transformer-based variational dialog models in dialog response generation. We publicly release the code for reproducing Dior-CVAE and all baselines at https://github.com/SkyFishMoon/Latent-Diffusion-Response-Generation.
翻译:条件变分自编码器(CVAEs)近期通过引入隐变量以表征对话上下文与其潜在响应之间的关系,已被用于多样化响应生成。然而,由于各向同性高斯先验的过度简化假设,CVAE模型生成的响应多样性受到限制。我们提出Dior-CVAE,一种层级化CVAE模型,其采用扩散模型生成的信息性先验。Dior-CVAE通过注意力机制推导一系列逐层隐变量,并将其相应注入解码器层。我们在隐变量注入中提出记忆丢弃法以缓解后验坍塌。隐变量的先验分布由扩散模型参数化,从而引入多模态分布。总体而言,在两个主流开放域对话数据集上的实验表明,我们的方法在对话响应生成任务中优于先前的基于Transformer的变分对话模型。我们已在https://github.com/SkyFishMoon/Latent-Diffusion-Response-Generation上公开发布用于复现Dior-CVAE及所有基线的代码。