基于视觉提示的具身引导强化学习 (Affordance-Guided Reinforcement Learning via Visual Prompting)

from arxiv, 8 pages, 6 figures. Robotics: Science and Systems (RSS) 2024, Task Specification for General-Purpose Intelligent Robots & Lifelong Robot Learning Workshops

Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics that can perform visual reasoning in physical contexts and generate coarse robot motions for manipulation tasks. Motivated by this range of capability, in this work, we present Keypoint-based Affordance Guidance for Improvements (KAGI), a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. State-of-the-art VLMs have demonstrated impressive reasoning about affordances through keypoints in zero-shot, and we use these to define dense rewards that guide autonomous robotic learning. On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps. Additionally, we demonstrate the robustness of KAGI to reductions in the number of in-domain demonstrations used for pre-training, reaching similar performance in 35K online fine-tuning steps. Project website: https://sites.google.com/view/affordance-guided-rl

翻译：配备强化学习（RL）的机器人有潜力仅从奖励信号中学习广泛的技能。然而，为通用操作任务获取稳健且密集的奖励信号仍然是一个挑战。现有的基于学习的方法需要大量数据，例如成功与失败的人类演示，以学习任务特定的奖励函数。最近，用于机器人技术的大型多模态基础模型也日益普及，这些模型能够在物理环境中进行视觉推理，并为操作任务生成粗略的机器人运动。受此能力范围的启发，本文提出了基于关键点的具身引导改进方法（KAGI），这是一种利用视觉语言模型（VLM）生成的奖励信号来塑造自主强化学习的方法。最先进的视觉语言模型已通过零样本关键点展示了令人印象深刻的具身推理能力，我们利用这些能力来定义密集奖励，以指导自主机器人学习。在由自然语言描述指定的真实世界操作任务中，KAGI提高了自主强化学习的样本效率，并能在20K次在线微调步数内成功完成任务。此外，我们还证明了KAGI对于减少用于预训练的领域内演示数量的鲁棒性，在35K次在线微调步数内达到相似的性能。项目网站：https://sites.google.com/view/affordance-guided-rl