Large Language Models as Commonsense Knowledge for Large-Scale Task Planning

Large-scale task planning is a major challenge. Recent work exploits large language models (LLMs) directly as a policy and shows surprisingly interesting results. This paper shows that LLMs provide a commonsense model of the world in addition to a policy that acts on it. The world model and the policy can be combined in a search algorithm, such as Monte Carlo Tree Search (MCTS), to scale up task planning. In our new LLM-MCTS algorithm, the LLM-induced world model provides a commonsense prior belief for MCTS to achieve effective reasoning; the LLM-induced policy acts as a heuristic to guide the search, vastly improving search efficiency. Experiments show that LLM-MCTS outperforms both MCTS alone and policies induced by LLMs (GPT2 and GPT3.5) by a wide margin, for complex, novel tasks. Further experiments and analyses on multiple tasks -- multiplication, multi-hop travel planning, object rearrangement -- suggest minimum description length (MDL) as a general guiding principle: if the description length of the world model is substantially smaller than that of the policy, using LLM as a world model for model-based planning is likely better than using LLM solely as a policy.

翻译：大规模任务规划是一项重大挑战。近期研究直接利用大型语言模型（LLMs）作为策略，展示了令人惊讶的有趣结果。本文表明，LLMs不仅提供作用于世界的策略，还提供了世界的常识模型。世界模型和策略可以结合到搜索算法中（如蒙特卡洛树搜索，MCTS），以扩展任务规划能力。在我们的新LLM-MCTS算法中，LLM诱导的世界模型为MCTS提供常识先验信念，实现有效推理；LLM诱导的策略作为启发式引导搜索，大幅提升搜索效率。实验证明，对于复杂新颖的任务，LLM-MCTS在性能上远超单独的MCTS或LLM（GPT2和GPT3.5）诱导的策略。针对乘法运算、多跳旅行规划、物体重排等多个任务的进一步实验和分析表明，最小描述长度（MDL）可作为通用指导原则：若世界模型的描述长度明显小于策略的描述长度，则利用LLM作为世界模型进行基于模型的规划，优于仅将其用作策略。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/