Most existing dialogue corpora and models have been designed to fit into 2 predominant categories : task-oriented dialogues portray functional goals, such as making a restaurant reservation or booking a plane ticket, while chit-chat/open-domain dialogues focus on holding a socially engaging talk with a user. However, humans tend to seamlessly switch between modes and even use chitchat to enhance task-oriented conversations. To bridge this gap, new datasets have recently been created, blending both communication modes into conversation examples. The approaches used tend to rely on adding chit-chat snippets to pre-existing, human-generated task-oriented datasets. Given the tendencies observed in humans, we wonder however if the latter do not \textit{already} hold chit-chat sequences. By using topic modeling and searching for topics which are most similar to a set of keywords related to social talk, we explore the training sets of Schema-Guided Dialogues and MultiWOZ. Our study shows that sequences related to social talk are indeed naturally present, motivating further research on ways chitchat is combined into task-oriented dialogues.
翻译:大多数现有的对话语料库和模型被设计为适应两种主要类别:任务导向对话描述功能目标,例如预订餐厅或机票,而闲聊/开放域对话则侧重于与用户进行社交性的互动交谈。然而,人类倾向于在不同模式间无缝切换,甚至使用闲聊来增强任务导向的对话。为了弥合这一差距,最近创建了新的数据集,将两种沟通模式混合到对话实例中。所用方法通常依赖于在预先存在的、人工生成的任务导向数据集中添加闲聊片段。然而,鉴于在人类中观察到的倾向,我们想知道这些数据集是否本已包含闲聊序列。通过使用主题建模并搜索与一组社交谈话关键词最相似的主题,我们探索了Schema-Guided Dialogues和MultiWOZ的训练集。我们的研究表明,与社交谈话相关的序列确实自然存在,这促使进一步研究闲聊与任务导向对话结合的方式。