Large-scale task planning is a major challenge. Recent work exploits large language models (LLMs) directly as a policy and shows surprisingly interesting results. This paper shows that LLMs provide a commonsense model of the world in addition to a policy that acts on it. The world model and the policy can be combined in a search algorithm, such as Monte Carlo Tree Search (MCTS), to scale up task planning. In our new LLM-MCTS algorithm, the LLM-induced world model provides a commonsense prior belief for MCTS to achieve effective reasoning; the LLM-induced policy acts as a heuristic to guide the search, vastly improving search efficiency. Experiments show that LLM-MCTS outperforms both MCTS alone and policies induced by LLMs (GPT2 and GPT3.5) by a wide margin, for complex, novel tasks. Further experiments and analyses on multiple tasks -- multiplication, multi-hop travel planning, object rearrangement -- suggest minimum description length (MDL) as a general guiding principle: if the description length of the world model is substantially smaller than that of the policy, using LLM as a world model for model-based planning is likely better than using LLM solely as a policy.
翻译:大规模任务规划是一项重大挑战。近期研究直接利用大型语言模型(LLMs)作为策略,展示了令人惊讶的有趣结果。本文表明,LLMs不仅提供作用于世界的策略,还提供了世界的常识模型。世界模型和策略可以结合到搜索算法中(如蒙特卡洛树搜索,MCTS),以扩展任务规划能力。在我们的新LLM-MCTS算法中,LLM诱导的世界模型为MCTS提供常识先验信念,实现有效推理;LLM诱导的策略作为启发式引导搜索,大幅提升搜索效率。实验证明,对于复杂新颖的任务,LLM-MCTS在性能上远超单独的MCTS或LLM(GPT2和GPT3.5)诱导的策略。针对乘法运算、多跳旅行规划、物体重排等多个任务的进一步实验和分析表明,最小描述长度(MDL)可作为通用指导原则:若世界模型的描述长度明显小于策略的描述长度,则利用LLM作为世界模型进行基于模型的规划,优于仅将其用作策略。