Large language models (LLMs) demonstrate impressive performance on a wide variety of tasks, but they often struggle with tasks that require multi-step reasoning or goal-directed planning. Both cognitive neuroscience and reinforcement learning (RL) have proposed a number of interacting functional components that together implement search and evaluation in multi-step decision making. These components include conflict monitoring, state prediction, state evaluation, task decomposition, and orchestration. To improve planning with LLMs, we propose an agentic architecture, the Modular Agentic Planner (MAP), in which planning is accomplished via the recurrent interaction of the specialized modules mentioned above, each implemented using an LLM. MAP improves planning through the interaction of specialized modules that break down a larger problem into multiple brief automated calls to the LLM. We evaluate MAP on three challenging planning tasks -- graph traversal, Tower of Hanoi, and the PlanBench benchmark -- as well as an NLP task requiring multi-step reasoning (strategyQA). We find that MAP yields significant improvements over both standard LLM methods (zero-shot prompting, in-context learning) and competitive baselines (chain-of-thought, multi-agent debate, and tree-of-thought), can be effectively combined with smaller and more cost-efficient LLMs (Llama3-70B), and displays superior transfer across tasks. These results suggest the benefit of a modular and multi-agent approach to planning with LLMs.
翻译:大语言模型(LLM)在多种任务上展现出卓越性能,但在需要多步推理或目标导向规划的任务中往往表现欠佳。认知神经科学与强化学习(RL)领域已提出多种相互作用的机能组件,共同实现多步决策中的搜索与评估过程。这些组件包括冲突监控、状态预测、状态评估、任务分解与流程协调。为提升LLM的规划能力,我们提出一种智能体架构——模块化智能体规划器(MAP),该架构通过上述专用模块的循环交互实现规划功能,每个模块均基于LLM构建。MAP通过专用模块的交互将复杂问题分解为多个简短的自动化LLM调用,从而优化规划过程。我们在三项具有挑战性的规划任务(图遍历、汉诺塔、PlanBench基准测试)以及一项需要多步推理的NLP任务(strategyQA)上评估MAP。实验表明:相较于标准LLM方法(零样本提示、上下文学习)及竞争基线方法(思维链、多智能体辩论、思维树),MAP均取得显著提升;其能有效结合更小规模、更高成本效益的LLM(Llama3-70B);并展现出卓越的跨任务迁移能力。这些结果证明了采用模块化多智能体方法进行LLM规划的优势。