Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics that can perform visual reasoning in physical contexts and generate coarse robot motions for manipulation tasks. Motivated by this range of capability, in this work, we present Keypoint-based Affordance Guidance for Improvements (KAGI), a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. State-of-the-art VLMs have demonstrated impressive reasoning about affordances through keypoints in zero-shot, and we use these to define dense rewards that guide autonomous robotic learning. On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps. Additionally, we demonstrate the robustness of KAGI to reductions in the number of in-domain demonstrations used for pre-training, reaching similar performance in 35K online fine-tuning steps. Project website: https://sites.google.com/view/affordance-guided-rl
翻译:配备强化学习(RL)的机器人有潜力仅从奖励信号中学习广泛技能。然而,为通用操作任务获取稳健且密集的奖励信号仍具挑战。现有基于学习的方法需要大量数据(如成功与失败的人类示范)来学习任务特定的奖励函数。近年来,大型多模态基础模型在机器人领域日益普及,这些模型能够在物理环境中进行视觉推理,并为操作任务生成粗略的机器人动作。受此能力启发,本研究提出基于关键点的具身引导改进方法(KAGI),一种利用视觉语言模型(VLM)塑造奖励以驱动自主强化学习的方法。最先进的视觉语言模型已通过关键点展现出零样本条件下对具身属性的卓越推理能力,我们利用此特性定义密集奖励以引导自主机器人学习。在由自然语言描述指定的真实世界操作任务中,KAGI提升了自主强化学习的样本效率,仅需2万次在线微调步骤即可实现任务成功完成。此外,我们证明了KAGI对预训练所用领域内示范数量减少的鲁棒性,在3.5万次在线微调步骤中达到相近性能。项目网站:https://sites.google.com/view/affordance-guided-rl