Task-oriented dialog systems empower users to accomplish their goals by facilitating intuitive and expressive natural language interactions. State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task and fine-tune pre-trained causal language models in the supervised setting. This requires labeled training data for each new domain or task, and acquiring such data is prohibitively laborious and expensive, thus making it a bottleneck for scaling systems to a wide range of domains. To overcome this challenge, we introduce a novel Zero-Shot generalizable end-to-end Task-oriented Dialog system, ZS-ToD, that leverages domain schemas to allow for robust generalization to unseen domains and exploits effective summarization of the dialog history. We employ GPT-2 as a backbone model and introduce a two-step training process where the goal of the first step is to learn the general structure of the dialog data and the second step optimizes the response generation as well as intermediate outputs, such as dialog state and system actions. As opposed to state-of-the-art systems that are trained to fulfill certain intents in the given domains and memorize task-specific conversational patterns, ZS-ToD learns generic task-completion skills by comprehending domain semantics via domain schemas and generalizing to unseen domains seamlessly. We conduct an extensive experimental evaluation on SGD and SGD-X datasets that span up to 20 unique domains and ZS-ToD outperforms state-of-the-art systems on key metrics, with an improvement of +17% on joint goal accuracy and +5 on inform. Additionally, we present a detailed ablation study to demonstrate the effectiveness of the proposed components and training mechanism
翻译:任务型对话系统通过支持直观且富有表现力的自然语言交互,使用户能够达成其目标。当前最先进的端到端任务型对话系统方法将该问题建模为条件序列生成任务,并在监督环境下微调预训练的因果语言模型。然而,这种方法需要为每个新领域或新任务准备标注训练数据,而获取此类数据成本高昂且耗时费力,因此成为将系统扩展至广泛领域的关键瓶颈。为克服这一挑战,我们提出了一种新颖的零样本可泛化端到端任务型对话系统ZS-ToD,该系统通过利用领域模式实现对未见领域的高鲁棒性泛化,并利用对话历史的高效摘要。我们采用GPT-2作为主干模型,并引入两阶段训练流程:第一阶段旨在学习对话数据的通用结构,第二阶段则优化回复生成以及对话状态、系统动作等中间输出。与当前需要在给定领域内实现特定意图并记忆任务特定对话模式的先进系统不同,ZS-ToD通过领域模式理解领域语义,习得通用任务完成技能,并能够无缝泛化至未见领域。我们在覆盖多达20个独特领域的SGD与SGD-X数据集上进行了广泛实验评估,结果显示ZS-ToD在关键指标上超越现有最优系统,联合目标准确率提升17%,信息告知率提升5分。此外,我们通过详细的消融研究验证了所提组件与训练机制的有效性。