Interactive grasping from clutter, akin to human dexterity, is one of the longest-standing problems in robot learning. Challenges stem from the intricacies of visual perception, the demand for precise motor skills, and the complex interplay between the two. In this work, we present Teacher-Augmented Policy Gradient (TAPG), a novel two-stage learning framework that synergizes reinforcement learning and policy distillation. After training a teacher policy to master the motor control based on object pose information, TAPG facilitates guided, yet adaptive, learning of a sensorimotor policy, based on object segmentation. We zero-shot transfer from simulation to a real robot by using Segment Anything Model for promptable object segmentation. Our trained policies adeptly grasp a wide variety of objects from cluttered scenarios in simulation and the real world based on human-understandable prompts. Furthermore, we show robust zero-shot transfer to novel objects. Videos of our experiments are available at \url{https://maltemosbach.github.io/grasp_anything}.
翻译:从杂乱环境中进行交互式抓取,类似于人类的灵巧操作,是机器人学习领域中最长期存在的难题之一。其挑战源于视觉感知的复杂性、对精确运动技能的需求,以及两者之间复杂的相互作用。在本工作中,我们提出教师增强策略梯度(Teacher-Augmented Policy Gradient, TAPG),一种新颖的两阶段学习框架,协同强化学习与策略蒸馏。在训练教师策略基于物体姿态信息掌握运动控制之后,TAPG 促进了基于物体分割的引导式且自适应的感知运动策略学习。我们利用 Segment Anything Model 进行可提示物体分割,实现了从仿真到真实机器人的零样本迁移。基于人类可理解的提示,我们训练的策略能够熟练地从仿真和真实世界中的杂乱场景中抓取多种物体。此外,我们展示了对于新物体的鲁棒零样本迁移能力。实验视频请见 \url{https://maltemosbach.github.io/grasp_anything}。