HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation

Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about gripper-object interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial poses, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving an 89% success rate on unseen objects in simulation and 50% success rate with zero-shot transfer in the real world. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills. Videos can be found on the project website: https://hacman-2023.github.io.

翻译：摘要：无抓取操作是人类灵巧操作的重要组成部分，称为非抓取操作。非抓取操作能够实现更复杂的物体交互，同时也对推理夹爪与物体间的相互作用提出了挑战。本文提出了一种基于点云观测的六自由度非抓取操作强化学习方法——混合演员-评论家操作地图（HACMan）。HACMan提出了一种时间抽象、空间锚定的以物体为中心的动作表征，该表征包括从物体点云中选择接触位置，以及描述机器人接触后运动方式的一组运动参数。我们修改了现有离线策略强化学习算法，使其能够在这种混合离散-连续动作表征中学习。我们在仿真和真实世界中评估了HACMan在六自由度物体位姿对齐任务上的表现。在任务最困难版本中，初始位姿随机、六自由度目标随机且物体类别多样，我们的策略展现出对未见物体类别的强大泛化能力，性能未出现下降：仿真中对未见物体达到89%的成功率，真实世界中零样本迁移的成功率达50%。与其他动作表征相比，HACMan的成功率比最优基线高出三倍以上。通过零样本仿真到真实迁移，我们的策略能够在真实世界中成功操作未见物体，完成具有挑战性的非平面目标任务，并利用了动态且富含接触的非抓取技能。相关视频可通过项目网站查看：https://hacman-2023.github.io。