We propose a new approach to vision-based dexterous grasp translation, which aims to transfer grasp intent across robotic hands with differing morphologies. Given a visual observation of a source hand grasping an object, our goal is to synthesize a functionally equivalent grasp for a target hand without requiring paired demonstrations or hand-specific simulations. We frame this problem as a stochastic transport between grasp distributions using the Schr\"odinger Bridge formalism. Our method learns to map between source and target latent grasp spaces via score and flow matching, conditioned on visual observations. To guide this translation, we introduce physics-informed cost functions that encode alignment in base pose, contact maps, wrench space, and manipulability. Experiments across diverse hand-object pairs demonstrate our approach generates stable, physically grounded grasps with strong generalization. This work enables semantic grasp transfer for heterogeneous manipulators and bridges vision-based grasping with probabilistic generative modeling. Additional details at https://grasp2grasp.github.io/
翻译:我们提出了一种新的基于视觉的灵巧抓取迁移方法,旨在将抓取意图在不同形态的机器人手之间进行传递。给定源手抓取物体的视觉观测,我们的目标是为目标手合成一个功能等效的抓取,而无需配对的演示数据或针对特定手型的仿真。我们利用薛定谔桥形式主义,将该问题构建为抓取分布之间的随机传输。我们的方法通过学习,在视觉观测的条件下,通过分数匹配和流匹配,实现源手与目标手潜在抓取空间之间的映射。为了引导这一迁移过程,我们引入了基于物理约束的成本函数,这些函数编码了基座姿态、接触图、力旋量空间和可操作度之间的对齐关系。在不同手-物体组合上的实验表明,我们的方法能够生成稳定、基于物理的抓取,并具有很强的泛化能力。这项工作实现了异构机械臂之间的语义抓取迁移,并将基于视觉的抓取与概率生成建模联系起来。更多细节请访问 https://grasp2grasp.github.io/