The evolution of large language models (LLMs) has enhanced the planning capabilities of language agents in diverse real-world scenarios. Despite these advancements, the potential of LLM-powered agents to comprehend ambiguous user instructions for reasoning and decision-making is still under exploration. In this work, we introduce a new task, Proactive Agent Planning, which requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction, invoke external tools to collect valid information, and generate a plan to fulfill the user's demands. To study this practical problem, we establish a new benchmark dataset, Ask-before-Plan. To tackle the deficiency of LLMs in proactive planning, we propose a novel multi-agent framework, Clarification-Execution-Planning (\texttt{CEP}), which consists of three agents specialized in clarification, execution, and planning. We introduce the trajectory tuning scheme for the clarification agent and static execution agent, as well as the memory recollection mechanism for the dynamic execution agent. Extensive evaluations and comprehensive analyses conducted on the Ask-before-Plan dataset validate the effectiveness of our proposed framework.
翻译:大型语言模型(LLM)的发展增强了语言智能体在多样化现实场景中的规划能力。尽管取得了这些进展,基于LLM的智能体在理解模糊用户指令以进行推理与决策方面的潜力仍有待探索。本研究提出一项新任务——主动智能体规划,要求语言智能体基于用户-智能体对话及智能体-环境交互预测澄清需求,调用外部工具收集有效信息,并生成满足用户需求的规划方案。为研究这一实际问题,我们构建了新的基准数据集Ask-before-Plan。针对LLM在主动规划方面的不足,我们提出一种新颖的多智能体框架——澄清-执行-规划(\texttt{CEP}),该框架包含专门负责澄清、执行与规划的三个智能体。我们为澄清智能体与静态执行智能体设计了轨迹调优方案,并为动态执行智能体引入了记忆回溯机制。在Ask-before-Plan数据集上开展的广泛评估与综合分析验证了所提框架的有效性。