Large language models (LLMs) have demonstrated remarkable zero-shot generalization abilities: state-of-the-art chatbots can provide plausible answers to many common questions that arise in daily life. However, so far, LLMs cannot reliably solve long-horizon planning problems. By contrast, classical planners, once a problem is given in a formatted way, can use efficient search algorithms to quickly identify correct, or even optimal, plans. In an effort to get the best of both worlds, this paper introduces LLM+P, the first framework that incorporates the strengths of classical planners into LLMs. LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language. LLM+P does so by first converting the language description into a file written in the planning domain definition language (PDDL), then leveraging classical planners to quickly find a solution, and then translating the found solution back into natural language. Along with LLM+P, we define a diverse set of different benchmark problems taken from common planning scenarios. Via a comprehensive set of experiments on these benchmark problems, we find that LLM+P is able to provide optimal solutions for most problems, while LLMs fail to provide even feasible plans for most problems.\footnote{The code and results are publicly available at https://github.com/Cranial-XIX/llm-pddl.git.
翻译:大型语言模型(LLMs)已展现出显著的零样本泛化能力:最先进的聊天机器人能够为日常生活中的许多常见问题提供看似合理的答案。然而,迄今为止,LLMs仍无法可靠地解决长时域规划问题。相比之下,经典规划器在问题以格式化方式给出后,能够利用高效搜索算法快速识别正确甚至最优的规划方案。为融合二者优势,本文提出LLM+P——首个将经典规划器的优势融入LLMs的框架。LLM+P接收规划问题的自然语言描述,随后返回以自然语言表述的该问题的正确(或最优)规划方案。具体而言,LLM+P首先将语言描述转换为规划领域定义语言(PDDL)文件,然后利用经典规划器快速求解,最后将求解结果回译为自然语言。除LLM+P外,我们还从常见规划场景中定义了一组多样化的基准测试问题。通过在这组基准问题上的全面实验,我们发现LLM+P能够为大多数问题提供最优规划方案,而LLMs甚至无法为大多数问题提供可行规划方案。\footnote{代码与结果公开于 https://github.com/Cranial-XIX/llm-pddl.git。}