Human hands possess the dexterity to interact with diverse objects such as grasping specific parts of the objects and/or approaching them from desired directions. More importantly, humans can grasp objects of any shape without object-specific skills. Recent works synthesize grasping motions following single objectives such as a desired approach heading direction or a grasping area. Moreover, they usually rely on expensive 3D hand-object data during training and inference, which limits their capability to synthesize grasping motions for unseen objects at scale. In this paper, we unify the generation of hand-object grasping motions across multiple motion objectives, diverse object shapes and dexterous hand morphologies in a policy learning framework GraspXL. The objectives are composed of the graspable area, heading direction during approach, wrist rotation, and hand position. Without requiring any 3D hand-object interaction data, our policy trained with 58 objects can robustly synthesize diverse grasping motions for more than 500k unseen objects with a success rate of 82.2%. At the same time, the policy adheres to objectives, which enables the generation of diverse grasps per object. Moreover, we show that our framework can be deployed to different dexterous hands and work with reconstructed or generated objects. We quantitatively and qualitatively evaluate our method to show the efficacy of our approach. Our model and code will be available.
翻译:人类手部具备灵巧交互各类物体的能力,例如抓取物体的特定部位和/或从期望方向接近物体。更重要的是,人类无需针对特定物体掌握专门技能即可抓取任意形状的物体。近期研究虽能合成遵循单一目标的抓取运动(如期望接近方向或抓取区域),但通常依赖于昂贵的3D手物交互数据进行训练与推理,这制约了其大规模合成新物体抓取运动的能力。本文提出统一框架GraspXL,在策略学习框架下实现了涵盖多运动目标、多样物体形状及灵巧手形态的手物抓取运动生成。运动目标由可抓取区域、接近过程中的朝向方向、手腕旋转角度及手部位置组成。无需任何3D手物交互数据,本策略仅需58个物体训练即可稳健合成超过50万个未见物体的多样化抓取运动,成功率达82.2%。同时,该策略严格遵循运动目标,支持为单个物体生成多样化抓取方案。此外,我们证明该框架可部署至不同灵巧手型,并能适配重建或生成的物体。通过定量与定性评估验证了方法的有效性。模型与代码将开源。