Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information gathering. How to utilize ToD accurately, efficiently and effectively for information gathering has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can significantly enhance the performance of TOD through fine-tuning. However, current datasets primarily cater to user-led systems and are limited to predefined specific scenarios and slots, thereby necessitating improvements in the proactiveness, diversity, and capabilities of TOD. In this study, we present a detailed multi-domain task-oriented data construction process for conversations, and a Chinese dialogue dataset generated based on this process, \textbf{TransferTOD}, which authentically simulates human-machine dialogues in 30 popular life service scenarios. Leveraging this dataset, we trained a \textbf{TransferTOD-7B} model using full-parameter fine-tuning, showcasing notable abilities in slot filling and questioning. Our work has demonstrated its strong generalization capabilities in various downstream scenarios, significantly enhancing both data utilization efficiency and system performance. The data is released in https://github.com/KongLongGeFDU/TransferTOD.
翻译:任务导向对话(TOD)系统旨在高效处理任务导向的对话,包括信息收集。如何准确、高效且有效地利用TOD进行信息收集,一直是一项关键且具有挑战性的任务。近期研究表明,大型语言模型(LLMs)在对话、指令生成和推理方面表现出色,并能通过微调显著提升TOD的性能。然而,现有数据集主要服务于用户主导的系统,且局限于预定义的特定场景和槽位,因此TOD的主动性、多样性和能力有待提升。在本研究中,我们提出了一种详细的多领域任务导向对话数据构建流程,并基于此流程生成了一个中文对话数据集——\textbf{TransferTOD},该数据集真实模拟了30个热门生活服务场景中的人机对话。利用该数据集,我们通过全参数微调训练了\textbf{TransferTOD-7B}模型,该模型在槽位填充和提问方面展现出显著能力。我们的工作证明了其在多种下游场景中强大的泛化能力,显著提升了数据利用效率和系统性能。数据集发布于 https://github.com/KongLongGeFDU/TransferTOD。