Large Language Models (LLMs) have demonstrated impressive planning abilities due to their vast "world knowledge". Yet, obtaining plans that are both feasible (grounded in affordances) and cost-effective (in plan length), remains a challenge, despite recent progress. This contrasts with heuristic planning methods that employ domain knowledge (formalized in action models such as PDDL) and heuristic search to generate feasible, optimal plans. Inspired by this, we propose to combine the power of LLMs and heuristic planning by leveraging the world knowledge of LLMs and the principles of heuristic search. Our approach, SayCanPay, employs LLMs to generate actions (Say) guided by learnable domain knowledge, that evaluates actions' feasibility (Can) and long-term reward/payoff (Pay), and heuristic search to select the best sequence of actions. Our contributions are (1) a novel framing of the LLM planning problem in the context of heuristic planning, (2) integrating grounding and cost-effective elements into the generated plans, and (3) using heuristic search over actions. Our extensive evaluations show that our model surpasses other LLM planning approaches.
翻译:大语言模型(LLMs)因其丰富的"世界知识"展现出强大的规划能力。然而,尽管近期取得了进展,生成既可行(基于具身能力)又成本高效(在规划长度方面)的方案仍是一大挑战。这与采用领域知识(以PDDL等动作模型形式化)和启发式搜索来生成可行最优规划的启发式规划方法形成对比。受此启发,我们提出通过结合LLMs的世界知识与启发式搜索原理,融合LLMs与启发式规划的优势。我们的方法SayCanPay利用LLMs生成动作(Say),并借助可学习的领域知识评估动作的可行性(Can)与长期回报(Pay),同时通过启发式搜索选择最优动作序列。本文贡献包括:(1) 在启发式规划框架下重新定义LLM规划问题,(2) 将具身性与成本高效要素整合到生成规划中,(3) 应用启发式搜索进行动作选择。大量实验表明,我们的模型超越了其他LLM规划方法。