AutoGPT+P: Affordance-based Task Planning with Large Language Models

Recent advances in task planning leverage Large Language Models (LLMs) to improve generalizability by combining such models with classical planning algorithms to address their inherent limitations in reasoning capabilities. However, these approaches face the challenge of dynamically capturing the initial state of the task planning problem. To alleviate this issue, we propose AutoGPT+P, a system that combines an affordance-based scene representation with a planning system. Affordances encompass the action possibilities of an agent on the environment and objects present in it. Thus, deriving the planning domain from an affordance-based scene representation allows symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. In addition to solving planning tasks under a closed-world assumption, AutoGPT+P can also handle planning with incomplete information, e. g., tasks with missing objects by exploring the scene, suggesting alternatives, or providing a partial plan. The affordance-based scene representation combines object detection with an automatically generated object-affordance-mapping using ChatGPT. The core planning tool extends existing work by automatically correcting semantic and syntactic errors. Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan on the SayCan instruction set. Furthermore, we evaluated our approach on our newly created dataset with 150 scenarios covering a wide range of complex tasks with missing objects, achieving a success rate of 79% on our dataset. The dataset and the code are publicly available at https://git.h2t.iar.kit.edu/birr/autogpt-p-standalone.

翻译：近期任务规划领域的进展通过将大型语言模型（LLM）与经典规划算法相结合，以改善泛化能力并弥补其在推理能力上的固有局限。然而，这些方法面临动态捕捉任务规划问题初始状态的挑战。为缓解此问题，我们提出AutoGPT+P系统，该系统将基于可供性的场景表征与规划系统相结合。可供性涵盖了智能体对环境中存在的物体及环境本身可执行的动作可能性。因此，从基于可供性的场景表征推导规划域，可实现任意物体的符号化规划。AutoGPT+P利用该表征来推导并执行用户以自然语言指定的任务规划。除了在封闭世界假设下解决规划任务外，AutoGPT+P还能处理信息不完整的规划问题，例如通过场景探索、提供替代方案或生成部分规划来处理缺失物体的任务。基于可供性的场景表征将物体检测与使用ChatGPT自动生成的物体-可供性映射相结合。核心规划工具通过自动校正语义和句法错误扩展了现有工作。我们的方法实现了98%的成功率，超越了当前基于LLM的先进规划方法SayCan在SayCan指令集上81%的成功率。此外，我们在新构建的包含150个场景的数据集上评估了该方法，这些场景涵盖了涉及缺失物体的各类复杂任务，在我们的数据集上取得了79%的成功率。数据集与代码已公开于https://git.h2t.iar.kit.edu/birr/autogpt-p-standalone。