Proactive dialogues serve as a practical yet challenging dialogue problem in the era of large language models (LLMs), where the dialogue policy planning is the key to improving the proactivity of LLMs. Most existing studies enable the dialogue policy planning of LLMs using various prompting schemes or iteratively enhance this capability in handling the given case with verbal AI feedback. However, these approaches are either bounded by the policy planning capability of the frozen LLMs or hard to be transferred to new cases. In this work, we introduce a new dialogue policy planning paradigm to strategize LLMs for proactive dialogue problems with a tunable language model plug-in as a plug-and-play dialogue policy planner, named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data as well as reinforcement learning from goal-oriented AI feedback with dynamic interaction data collected by the LLM-based self-play simulation. In this manner, the LLM-powered dialogue agent can not only be generalized to different cases after the training, but also be applicable to different applications by just substituting the learned plug-in. In addition, we propose to evaluate the policy planning capability of dialogue systems under the interactive setting. Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.
翻译:proactive对话在大语言模型(LLM)时代既实用又具挑战性,其中对话策略规划是提升LLM主动性的关键。现有研究大多通过各类提示方案使LLM具备对话策略规划能力,或通过语言化AI反馈迭代强化其对特定案例的处理能力。然而,这些方法要么受限于冻结参数LLM的固有能力,要么难以迁移至新场景。本文提出一种新型对话策略规划范式,通过可调语言模型插件(名为PPDPP)作为即插即用的对话策略规划模块,为大语言模型在主动对话问题中提供策略指导。具体而言,我们开发了新型训练框架,既可利用人工标注数据进行监督微调,也能通过基于LLM自博弈仿真生成的动态交互数据,结合目标导向的AI反馈进行强化学习。通过该方式,训练后的大语言模型驱动的对话智能体不仅能泛化至不同场景,还可通过替换学得的插件模块适配不同应用领域。此外,我们提出在交互式场景下评估对话系统的策略规划能力。实验结果表明,PPDPP在谈判、情感支持和教学对话三类不同的主动对话应用中,均持续显著优于现有方法。