In the pursuit of fully autonomous robotic systems capable of taking over tasks traditionally performed by humans, the complexity of open-world environments poses a considerable challenge. Addressing this imperative, this study contributes to the field of Large Language Models (LLMs) applied to task and motion planning for robots. We propose a system architecture that orchestrates a seamless interplay between multiple cognitive levels, encompassing reasoning, planning, and motion generation. At its core lies a novel replanning strategy that handles physically grounded, logical, and semantic errors in the generated plans. We demonstrate the efficacy of the proposed feedback architecture, particularly its impact on executability, correctness, and time complexity via empirical evaluation in the context of a simulation and two intricate real-world scenarios: blocks world, barman and pizza preparation.
翻译:摘要:在追求能接管人类传统任务的完全自主机器人系统的过程中,开放世界环境的复杂性构成了一项重大挑战。为应对这一迫切需求,本研究致力于将大规模语言模型应用于机器人的任务与运动规划领域。我们提出了一种系统架构,该架构协调了多个认知层面(包括推理、规划与运动生成)之间的无缝交互。其核心是一种新颖的重新规划策略,能够处理生成计划中基于物理的、逻辑的以及语义层面的错误。通过在仿真环境及两个复杂的真实世界场景(积木世界、调酒师任务与披萨制作)中的实证评估,我们展示了所提出的反馈架构的有效性,特别是其在可执行性、正确性与时间复杂度方面的影响。