Proactive dialogues serve as a practical yet challenging dialogue problem in the era of large language models (LLMs), where the dialogue policy planning is the key to improving the proactivity of LLMs. Most existing studies enable the dialogue policy planning of LLMs using various prompting schemes or iteratively enhance this capability in handling the given case with verbal AI feedback. However, these approaches are either bounded by the policy planning capability of the frozen LLMs or hard to be transferred to new cases. In this work, we introduce a new dialogue policy planning paradigm to strategize LLMs for proactive dialogue problems with a tunable language model plug-in as a plug-and-play dialogue policy planner, named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data as well as reinforcement learning from goal-oriented AI feedback with dynamic interaction data collected by the LLM-based self-play simulation. In this manner, the LLM-powered dialogue agent can not only be generalized to different cases after the training, but also be applicable to different applications by just substituting the learned plug-in. In addition, we propose to evaluate the policy planning capability of dialogue systems under the interactive setting. Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.
翻译:主动式对话是大语言模型时代中一个既实用又具挑战性的对话问题,而对话策略规划是提升大语言模型主动性的关键。现有研究主要通过各类提示方案或利用基于语言反馈的迭代增强方式,使大语言模型具备对话策略规划能力。然而,这些方法要么受限于冻结的大语言模型本身策略规划能力,要么难以迁移至新场景。本文提出一种全新的对话策略规划范式——PPDPP,通过可调语言模型插件作为即插即用的对话策略规划器,为大语言模型注入主动对话策略。具体而言,我们开发了一种新型训练框架,既能利用人工标注数据进行监督微调,又能通过基于大语言模型的自我博弈模拟收集动态交互数据,从面向目标的AI反馈中进行强化学习。采用该方案后,大语言模型驱动的对话代理不仅能通过训练泛化至不同场景,且仅需替换习得的插件即可适用于不同应用领域。此外,我们还提出在交互式设置下评估对话系统的策略规划能力。实验结果表明,在谈判、情感支持与教学三种主动对话应用中,PPDPP始终显著优于现有方法。