Generating high-quality dexterous grasps remains challenging for learning-based methods, which often depend on carefully tuned contact losses or costly contact-based test-time refinement. We present KPGrasp, a flow-matching framework that learns dexterous grasp priors from large-scale data rather than relying on contact losses or contact-based test-time refinement. KPGrasp couples an all-Euclidean 3D hand-keypoint parameterization with a simple yet scalable Transformer flow model. The parameterization avoids the drawbacks of the conventional mixed SE(3) pose and joint-angle output space, expresses grasps in the same frame as the object point cloud, and thus enables native spatial reasoning; the Transformer flow model is trained with only the standard flow-matching loss and scales effectively with data, model capacity, and batch size. Experiments demonstrate state-of-the-art performance on two simulation benchmarks. On the Dexonomy benchmark, it reaches a 76.3% grasp success rate, improving over the strongest directly comparable baseline by 47.4% while reducing penetration depth to 2.4 mm. The same model also achieves the best average performance on the DexGrasp Anything benchmark without fine-tuning. For batched inference, KPGrasp requires only 0.032 s per grasp. Finally, real-world experiments on 20 diverse objects demonstrate that the pipeline can be deployed in a real-world setup.
翻译:基于学习的方法在生成高质量灵巧抓取时仍面临挑战,这些方法通常依赖于精心调整的接触损失或昂贵的基于接触的测试时优化。我们提出KPGrasp——一种流匹配框架,从大规模数据中学习灵巧抓取先验,而非依赖接触损失或基于接触的测试时优化。KPGrasp将全欧几里得三维手部关键点参数化与简单且可扩展的Transformer流模型相结合。该参数化避免了传统混合SE(3)位姿与关节角输出空间的缺陷,将抓取表示在与目标点云相同的坐标系中,从而实现了原生空间推理;Transformer流模型仅使用标准流匹配损失进行训练,并能随数据量、模型容量和批大小有效扩展。实验表明,该方法在两个仿真基准上均达到了最先进性能。在Dexonomy基准上,KPGrasp实现了76.3%的抓取成功率,相比最强的直接可比基线提升47.4%,同时将穿透深度降至2.4毫米。同一模型在不经微调的情况下,还在DexGrasp Anything基准上取得了最佳平均性能。对于批量推理,KPGrasp每次抓取仅需0.032秒。最后,在20种不同物体上的真实世界实验表明,该流程可在实际场景中部署。