To meet the demands of increasingly diverse dexterous hand hardware, it is crucial to develop a policy that enables zero-shot cross-embodiment grasping without redundant re-learning. Cross-embodiment alignment is challenging due to heterogeneous hand kinematics and physical constraints. Existing approaches typically predict intermediate motion targets and retarget them to each embodiment, which may introduce errors and violate embodiment-specific limits, hindering transfer across diverse hands. To overcome these limitations, we propose \textit{DexGrasp-Zero}, a policy that learns universal grasping skills from diverse embodiments, enabling zero-shot transfer to unseen hands. We first introduce a morphology-aligned graph representation that maps each hand's kinematic keypoints to anatomically grounded nodes and equips each node with tri-axial orthogonal motion primitives, enabling structural and semantic alignment across different morphologies. Relying on this graph-based representation, we design a \textit{Morphology-Aligned Graph Convolutional Network} (MAGCN) to encode the graph for policy learning. MAGCN incorporates a \textit{Physical Property Injection} mechanism that fuses hand-specific physical constraints into the graph features, enabling adaptive compensation for varying link lengths and actuation limits for precise and stable grasping. Our extensive simulation evaluations on the YCB dataset demonstrate that our policy, jointly trained on four heterogeneous hands (Allegro, Shadow, Schunk, Ability), achieves an 85\% zero-shot success rate on unseen hardware (LEAP, Inspire), outperforming the state-of-the-art method by 59.5\%. Real-world experiments further evaluate our policy on three robot platforms (LEAP, Inspire, Revo2), achieving an 82\% average success rate on unseen objects.
翻译:为满足日益多样化的灵巧手硬件需求,开发一种无需冗余重新学习即可实现零样本跨具身抓取的策略至关重要。由于异构的手部运动学与物理约束,跨具身对齐具有挑战性。现有方法通常预测中间运动目标并将其重定向至每个具身,这可能引入误差并违反特定具身的限制,从而阻碍在不同手型间的迁移。为克服这些局限,我们提出 \textit{DexGrasp-Zero},一种从多样具身中学习通用抓取技能的策略,能够实现零样本迁移至未见手型。我们首先引入一种形态对齐的图表示,将每只手的运动学关键点映射到基于解剖结构的节点,并为每个节点配备三轴正交运动基元,从而实现不同形态间的结构与语义对齐。基于此图表示,我们设计了一种 \textit{形态对齐图卷积网络} (MAGCN) 来编码该图以进行策略学习。MAGCN 融合了一种 \textit{物理属性注入} 机制,将手部特定的物理约束融入图特征中,从而能够自适应补偿变化的连杆长度与驱动限制,以实现精确稳定的抓取。我们在 YCB 数据集上进行的大量仿真评估表明,我们的策略在四种异构手(Allegro、Shadow、Schunk、Ability)上联合训练后,在未见硬件(LEAP、Inspire)上实现了 85\% 的零样本成功率,优于现有最佳方法 59.5\%。真实世界实验进一步在三个机器人平台(LEAP、Inspire、Revo2)上评估了我们的策略,在未见物体上取得了 82\% 的平均成功率。