Collection of annotated dialogs for training task-oriented dialog systems have been one of the key bottlenecks in improving current models. While dialog response generation has been widely studied on the agent side, it is not evident if similar generative models can be used to generate a large variety of, and often unexpected, user inputs that real dialog systems encounter in practice. Existing data augmentation techniques such as paraphrase generation do not take the dialog context into consideration. In this paper, we develop a novel dialog augmentation model that generates a user turn, conditioning on full dialog context. Additionally, with a new prompt design for language model, and output re-ranking, the dialogs generated from our model can be directly used to train downstream dialog systems. On common benchmark datasets MultiWoZ and SGD, we show that our dialog augmentation model generates high quality dialogs and improves dialog success rate by as much as $8\%$ over baseline.
翻译:为训练面向任务型对话系统而收集标注对话数据一直是改进当前模型的关键瓶颈之一。虽然对话响应生成已在智能体侧得到广泛研究,但类似的生成模型是否能生成真实对话系统在实践中遇到的大量且通常出人意料的用户输入尚不明确。现有的数据增强技术(如改述生成)未考虑对话上下文。本文提出一种新颖的对话增强模型,该模型基于完整对话上下文生成用户轮次。此外,通过针对语言模型的新提示设计及输出重排序,我们模型生成的对话可直接用于训练下游对话系统。在通用基准数据集MultiWoZ和SGD上,我们证明该对话增强模型可生成高质量对话,并将对话成功率相对基线提升高达8%。