Task-oriented dialogue is difficult in part because it involves understanding user intent, collecting information from the user, executing API calls, and generating helpful and fluent responses. However, for complex tasks one must also correctly do all of these things over multiple steps, and in a specific order. While large pre-trained language models can be fine-tuned end-to-end to create multi-step task-oriented dialogue agents that generate fluent text, our experiments confirm that this approach alone cannot reliably perform new multi-step tasks that are unseen during training. To address these limitations, we augment the dialogue contexts given to \textmd{text2text} transformers with known \textit{valid workflow names} and \textit{action plans}. Action plans consist of sequences of actions required to accomplish a task, and are encoded as simple sequences of keywords (e.g. verify-identity, pull-up-account, reset-password, etc.). We perform extensive experiments on the Action-Based Conversations Dataset (ABCD) with T5-small, base and large models, and show that such models: a) are able to more readily generalize to unseen workflows by following the provided plan, and b) are able to generalize to executing unseen actions if they are provided in the plan. In contrast, models are unable to fully accomplish new multi-step tasks when they are not provided action plan information, even when given new valid workflow names.
翻译:任务型对话之所以困难,部分原因在于它涉及理解用户意图、从用户处收集信息、执行API调用以及生成流畅且有用的回复。然而,对于复杂任务,还必须按特定顺序、通过多个步骤正确完成所有这些环节。尽管大规模预训练语言模型可通过端到端微调构建能生成流畅文本的多步骤任务型对话代理,但我们的实验证实:仅依靠该方法无法可靠执行训练中未见过的全新多步骤任务。为克服这些局限,我们向\textmd{text2text}转换器的对话上下文中补充了已知的\textit{有效工作流名称}与\textit{动作规划》。动作规划由完成任务所需的一系列动作序列构成,并编码为简单的关键词序列(例如:verify-identity, pull-up-account, reset-password等)。我们在基于动作的对话数据集(ABCD)上使用T5-small、T5-base和T5-large模型进行了广泛实验,结果表明:a) 通过遵循预设规划,此类模型能够更轻松地泛化至未见工作流;b) 若动作出现在规划中,模型可泛化执行未见动作。相比之下,当未提供动作规划信息时(即使给定新的有效工作流名称),模型也无法完整完成新颖的多步骤任务。