Conversational agents powered by Large Language Models (LLMs) show superior performance in various tasks. Despite the better user understanding and human-like responses, their lack of controllability remains a key challenge, often leading to unfocused conversations or task failure. To address this challenge, we propose Planning-based Conversational Agents (PCA), a novel dialogue framework aimed at enhancing the controllability of LLM-driven agents. Specifically, our approach introduces Standard Operating Procedure (SOP) to regulate dialogue flow. To enable PCA to learn SOP, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o and validated through strict manual quality control. Additionally, we propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes Monte Carlo Tree Search for optimal action planning during dialogues. Experimental results demonstrate the effectiveness of our method, such as achieving a 27.95% improvement in action accuracy compared to baseline models based on GPT-3.5 and also showing notable gains for open-source models. Dataset and codes are publicly available.
翻译:基于大语言模型(LLM)的对话智能体在多项任务中展现出卓越性能。尽管其具备更好的用户理解能力和类人响应,但可控性不足仍是关键挑战,常导致对话偏离主题或任务失败。为应对此挑战,我们提出基于规划的对话智能体(PCA),一种旨在增强LLM驱动智能体可控性的新型对话框架。具体而言,本方法引入标准操作程序(SOP)来规范对话流程。为使PCA能够学习SOP,我们构建了包含SOP标注的多场景对话数据集,该数据集通过基于GPT-4o的半自动化角色扮演系统生成,并经过严格的人工质量控制验证。此外,我们提出一种创新方法,将思维链推理与监督微调相结合用于SOP预测,并在对话过程中利用蒙特卡洛树搜索进行最优动作规划。实验结果表明本方法的有效性,例如相较于基于GPT-3.5的基线模型,动作准确率提升27.95%,在开源模型上也显示出显著增益。数据集与代码均已公开。