We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of relevant tables based on a on-going conversation, and (2) automatic generation of appropriate agent responses at each turn. We investigate the difficulty of each task by establishing strong baselines. Our experiments on a temporal data split reveal that all models struggle to generalize to future conversations, as we observe a significant drop in performance across both tasks when we move from the validation to the test set. In addition, we find that response generation models struggle to decide when to return a table. Considering that the tasks pose significant challenges to existing models, we encourage the community to develop models for our task, which can be directly used to help knowledge workers find relevant tables for live chat users.
翻译:我们提出了StatCan对话数据集,包含19,379轮加拿大统计局工作人员与在线寻找已发布数据表格用户之间的对话。这些对话源自真实需求,以英语或法语进行,最终由工作人员检索出5000多个复杂数据表格中的一个。基于该数据集,我们提出了两个任务:(1) 根据进行中的对话自动检索相关表格;(2) 在每轮对话中自动生成适当的代理回复。通过建立强基线模型,我们探究了每个任务的难度。在时间数据划分上的实验表明,所有模型在泛化至未来对话时均存在困难——当从验证集切换至测试集时,两个任务的性能均出现显著下降。此外,我们发现回复生成模型难以决定何时返回表格。考虑到这些任务对现有模型构成重大挑战,我们鼓励学术社区为我们的任务开发模型,这些模型可直接用于帮助知识工作者为实时聊天用户查找相关表格。