Planning in textual environments have been shown to be a long-standing challenge even for current models. A recent, promising line of work uses LLMs to generate a formal representation of the environment that can be solved by a symbolic planner. However, existing methods rely on a fully-observed environment where all entity states are initially known, so a one-off representation can be constructed, leading to a complete plan. In contrast, we tackle partially-observed environments where there is initially no sufficient information to plan for the end-goal. We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal. By accomplishing the sub-goal, more information is acquired to augment the representation, eventually achieving the end-goal. We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation, with strong performance (98%) on the more complex Cooking World simulation where end-to-end LLMs fail to generate coherent plans (4%).
翻译:在文本环境中进行规划已被证明是当前模型长期面临的挑战。近期一项有前景的研究方向是利用大语言模型生成环境的正式表示,该表示可由符号规划器求解。然而,现有方法依赖于完全可观测的环境,其中所有实体状态初始已知,因此可以一次性构建表示并生成完整规划。相比之下,我们处理的是部分可观测环境,其初始信息不足以针对最终目标进行规划。我们提出了PDDLEGO,该方法迭代式构建规划表示,从而为给定子目标生成部分规划。通过完成子目标,系统获取更多信息以增强表示,最终实现最终目标。实验表明,在Coin Collector模拟环境中,通过少量示例提示生成的PDDLEGO规划比端到端生成的规划效率高出43%;在更为复杂的Cooking World模拟环境中,PDDLEGO取得了优异性能(成功率98%),而端到端大语言模型则无法生成连贯规划(成功率仅4%)。