Advancements in large language models (LLMs) have demonstrated their potential in facilitating high-level reasoning, logical reasoning and robotics planning. Recently, LLMs have also been able to generate reward functions for low-level robot actions, effectively bridging the interface between high-level planning and low-level robot control. However, the challenge remains that even with syntactically correct plans, robots can still fail to achieve their intended goals due to imperfect plans or unexpected environmental issues. To overcome this, Vision Language Models (VLMs) have shown remarkable success in tasks such as visual question answering. Leveraging the capabilities of VLMs, we present a novel framework called Robotic Replanning with Perception and Language Models (RePLan) that enables online replanning capabilities for long-horizon tasks. This framework utilizes the physical grounding provided by a VLM's understanding of the world's state to adapt robot actions when the initial plan fails to achieve the desired goal. We developed a Reasoning and Control (RC) benchmark with eight long-horizon tasks to test our approach. We find that RePLan enables a robot to successfully adapt to unforeseen obstacles while accomplishing open-ended, long-horizon goals, where baseline models cannot, and can be readily applied to real robots. Find more information at https://replan-lm.github.io/replan.github.io/
翻译:大型语言模型(LLMs)的进步已展现出其在促进高层推理、逻辑推理及机器人规划方面的潜力。近期,LLMs还能为低层机器人动作生成奖励函数,有效弥合了高层规划与低层机器人控制之间的接口。然而,即便规划语法正确,机器人仍可能因规划不完善或意外的环境问题而无法达成预期目标。为克服这一挑战,视觉语言模型(VLMs)在视觉问答等任务中取得了显著成功。借助VLMs的能力,我们提出了一种名为"基于感知与语言模型的机器人重规划"(RePLan)的新型框架,该框架能够实现面向长时域任务的在线重规划能力。该框架利用VLM对世界状态理解的物理基础,在初始规划无法实现预期目标时调整机器人动作。我们开发了一个包含八项长时域任务的推理与控制(RC)基准来测试该方法。实验发现,RePLan能使机器人在完成开放式长时域目标时成功适应意外障碍,而基线模型则无法做到,且该方法可便捷地应用于真实机器人。更多信息请访问 https://replan-lm.github.io/replan.github.io/