Textual response generation is an essential task for multimodal task-oriented dialog systems.Although existing studies have achieved fruitful progress, they still suffer from two critical limitations: 1) focusing on the attribute knowledge but ignoring the relation knowledge that can reveal the correlations between different entities and hence promote the response generation}, and 2) only conducting the cross-entropy loss based output-level supervision but lacking the representation-level regularization. To address these limitations, we devise a novel multimodal task-oriented dialog system (named MDS-S2). Specifically, MDS-S2 first simultaneously acquires the context related attribute and relation knowledge from the knowledge base, whereby the non-intuitive relation knowledge is extracted by the n-hop graph walk. Thereafter, considering that the attribute knowledge and relation knowledge can benefit the responding to different levels of questions, we design a multi-level knowledge composition module in MDS-S2 to obtain the latent composed response representation. Moreover, we devise a set of latent query variables to distill the semantic information from the composed response representation and the ground truth response representation, respectively, and thus conduct the representation-level semantic regularization. Extensive experiments on a public dataset have verified the superiority of our proposed MDS-S2. We have released the codes and parameters to facilitate the research community.
翻译:文本回复生成是多模态任务导向对话系统的一项关键任务。尽管现有研究已取得丰硕进展,但仍存在两个关键局限性:1) 仅关注属性知识而忽略关系知识,后者能揭示不同实体间的关联性从而促进回复生成;2) 仅基于交叉熵损失进行输出级监督,缺乏表征级正则化。为解决上述问题,我们设计了一种新型多模态任务导向对话系统(命名为MDS-S2)。具体而言,MDS-S2首先从知识库中同步获取与上下文相关的属性知识和关系知识,其中非直观的关系知识通过n跳图遍历提取。考虑到属性知识和关系知识能分别服务于不同层次问题的回答,我们在MDS-S2中设计了一个多层次知识复合模块,以获取隐式复合回复表征。此外,我们构建了一组隐式查询变量,分别从复合回复表征和真实回复表征中提炼语义信息,从而实现表征级语义正则化。在公开数据集上的大量实验验证了所提出的MDS-S2的优越性。为促进研究社区发展,我们已开放代码和参数。