Robotic dexterous manipulation is a challenging problem due to high degrees of freedom (DoFs) and complex contacts of multi-fingered robotic hands. Many existing deep reinforcement learning (DRL) based methods aim at improving sample efficiency in high-dimensional output action spaces. However, existing works often overlook the role of representations in achieving generalization of a manipulation policy in the complex input space during the hand-object interaction. In this paper, we propose DexRep, a novel hand-object interaction representation to capture object surface features and spatial relations between hands and objects for dexterous manipulation skill learning. Based on DexRep, policies are learned for three dexterous manipulation tasks, i.e. grasping, in-hand reorientation, bimanual handover, and extensive experiments are conducted to verify the effectiveness. In simulation, for grasping, the policy learned with 40 objects achieves a success rate of 87.9% on more than 5000 unseen objects of diverse categories, significantly surpassing existing work trained with thousands of objects; for the in-hand reorientation and handover tasks, the policies also boost the success rates and other metrics of existing hand-object representations by 20% to 40%. The grasp policies with DexRep are deployed to the real world under multi-camera and single-camera setups and demonstrate a small sim-to-real gap.
翻译:灵巧机器人操作因多指机械手的高自由度与复杂接触而成为具有挑战性的难题。现有许多基于深度强化学习的方法致力于提升高维输出动作空间的样本效率。然而,现有研究常忽视表征在实现手-物交互过程中复杂输入空间内操作策略泛化能力的作用。本文提出DexRep——一种新颖的手-物交互表征,用于捕捉物体表面特征及手与物体间的空间关系以学习灵巧操作技能。基于DexRep,我们针对三种灵巧操作任务(抓取、手内重定向、双手交接)学习策略,并通过大量实验验证其有效性。在仿真环境中:对于抓取任务,使用40个物体训练的策略在超过5000个未见过的多类别物体上达到87.9%的成功率,显著超越使用数千物体训练的现有工作;对于手内重定向与交接任务,该策略将现有手-物表征的各项指标成功率提升20%至40%。采用DexRep的抓取策略在多摄像头与单摄像头配置下部署至现实世界,展现出较小的仿真到现实差距。