AutoGPT+P: Affordance-based Task Planning with Large Language Models

Recent advances in task planning leverage Large Language Models (LLMs) to improve generalizability by combining such models with classical planning algorithms to address their inherent limitations in reasoning capabilities. However, these approaches face the challenge of dynamically capturing the initial state of the task planning problem. To alleviate this issue, we propose AutoGPT+P, a system that combines an affordance-based scene representation with a planning system. Affordances encompass the action possibilities of an agent on the environment and objects present in it. Thus, deriving the planning domain from an affordance-based scene representation allows symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. In addition to solving planning tasks under a closed-world assumption, AutoGPT+P can also handle planning with incomplete information, e. g., tasks with missing objects by exploring the scene, suggesting alternatives, or providing a partial plan. The affordance-based scene representation combines object detection with an automatically generated object-affordance-mapping using ChatGPT. The core planning tool extends existing work by automatically correcting semantic and syntactic errors. Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan on the SayCan instruction set. Furthermore, we evaluated our approach on our newly created dataset with 150 scenarios covering a wide range of complex tasks with missing objects, achieving a success rate of 79% on our dataset. The dataset and the code are publicly available at https://git.h2t.iar.kit.edu/birr/autogpt-p-standalone.

翻译：近期任务规划的进展通过将大语言模型与经典规划算法相结合，利用其提升泛化能力以应对推理能力的固有局限。然而，这些方法面临动态捕捉任务规划问题初始状态的挑战。为解决这一问题，我们提出AutoGPT+P系统，该系统融合了基于可供性的场景表征与规划系统。可供性涵盖智能体对环境及其中物体的动作可能性。因此，从基于可供性的场景表征推导规划域，能够支持对任意物体的符号规划。AutoGPT+P利用这种表征，为用户以自然语言指定的任务推导并执行规划。除了在封闭世界假设下解决规划任务外，AutoGPT+P还能处理信息不完整的规划，例如通过探索场景、建议替代方案或提供部分规划来处理缺失物体的任务。基于可供性的场景表征结合了物体检测与使用ChatGPT自动生成的物体-可供性映射。核心规划工具通过自动修正语义和语法错误扩展了现有工作。我们的方法在SayCan指令集上达到了98%的成功率，超越了当前基于大语言模型的最先进规划方法SayCan的81%成功率。此外，我们在新创建的包含150个场景的数据集上进行了评估，该数据集覆盖了涉及缺失物体的广泛复杂任务，在我们的数据集上实现了79%的成功率。数据集及代码已在https://git.h2t.iar.kit.edu/birr/autogpt-p-standalone 公开提供。