Human hands possess the dexterity to interact with diverse objects such as grasping specific parts of the objects and/or approaching them from desired directions. More importantly, humans can grasp objects of any shape without object-specific skills. Recent works synthesize grasping motions following single objectives such as a desired approach heading direction or a grasping area. Moreover, they usually rely on expensive 3D hand-object data during training and inference, which limits their capability to synthesize grasping motions for unseen objects at scale. In this paper, we unify the generation of hand-object grasping motions across multiple motion objectives, diverse object shapes and dexterous hand morphologies in a policy learning framework GraspXL. The objectives are composed of the graspable area, heading direction during approach, wrist rotation, and hand position. Without requiring any 3D hand-object interaction data, our policy trained with 58 objects can robustly synthesize diverse grasping motions for more than 500k unseen objects with a success rate of 82.2%. At the same time, the policy adheres to objectives, which enables the generation of diverse grasps per object. Moreover, we show that our framework can be deployed to different dexterous hands and work with reconstructed or generated objects. We quantitatively and qualitatively evaluate our method to show the efficacy of our approach. Our model, code, and the large-scale generated motions are available at https://eth-ait.github.io/graspxl/.
翻译:人类手部具备与多样化物体交互的灵巧性,例如抓取物体的特定部位和/或从期望方向接近物体。更重要的是,人类能够抓取任意形状的物体而无需物体特定的技能。现有研究通常遵循单一目标(如期望的接近朝向或抓取区域)合成抓取动作,且普遍依赖训练与推断阶段昂贵的三维手-物体交互数据,这限制了其大规模合成未见物体抓取动作的能力。本文提出GraspXL策略学习框架,将多运动目标、多样化物体形态与灵巧手形态下的手-物体抓取动作生成进行统一建模。运动目标体系包含可抓取区域、接近过程中的朝向方向、手腕旋转及手部位置。在不依赖任何三维手-物体交互数据的前提下,基于58个物体训练的策略能够为超过50万个未见物体鲁棒地合成多样化抓取动作,成功率高达82.2%。同时,该策略严格遵循运动目标约束,支持为每个物体生成多样化的抓取姿态。此外,我们证明该框架可部署至不同灵巧手模型,并与重建或生成物体协同工作。通过定量与定性评估验证了方法的有效性。模型、代码及大规模生成动作数据发布于 https://eth-ait.github.io/graspxl/。