A representation gap exists between grasp synthesis for rigid and soft grippers. Anygrasp [1] and many other grasp synthesis methods are designed for rigid parallel grippers, and adapting them to soft grippers often fails to capture their unique compliant behaviors, resulting in data-intensive and inaccurate models. To bridge this gap, this paper proposes a novel framework to map grasp poses from a rigid gripper model to a soft Fin-ray gripper. We utilize Conditional Flow Matching (CFM), a generative model, to learn this complex transformation. Our methodology includes a data collection pipeline to generate paired rigid-soft grasp poses. A U-Net autoencoder conditions the CFM model on the object's geometry from a depth image, allowing it to learn a continuous mapping from an initial Anygrasp pose to a stable Fin-ray gripper pose. We validate our approach on a 7-DOF robot, demonstrating that our CFM-generated poses achieve a higher overall success rate for seen and unseen objects (34% and 46% respectively) compared to the baseline rigid poses (6% and 25% respectively) when executed by the soft gripper. The model shows significant improvements, particularly for cylindrical (50% and 100% success for seen and unseen objects) and spherical objects (25% and 31% success for seen and unseen objects), and successfully generalizes to unseen objects. This work presents CFM as a data-efficient and effective method for transferring grasp strategies, offering a scalable methodology for other soft robotic systems.
翻译:刚性夹爪与软体夹爪的抓取合成之间存在表征鸿沟。Anygrasp [1] 及众多其他抓取合成方法专为刚性平行夹爪设计,将其适配至软体夹爪时往往无法捕捉其独特的顺应性行为,导致模型数据需求量大且精度不足。为弥合此鸿沟,本文提出一种新颖框架,将抓取位姿从刚性夹爪模型映射至软体Fin-ray夹爪。我们利用生成模型——条件流匹配(CFM)来学习这一复杂变换。我们的方法包含一个数据采集流程,用于生成配对的刚性-软体抓取位姿。通过U-Net自编码器,CFM模型以深度图像中的物体几何形状为条件,从而能够学习从初始Anygrasp位姿到稳定Fin-ray夹爪位姿的连续映射。我们在一个7自由度机器人上验证了所提方法,结果表明:当由软体夹爪执行时,与基线刚性位姿(分别为6%和25%)相比,我们的CFM生成位姿对已见和未见物体实现了更高的整体成功率(分别为34%和46%)。该模型展现出显著改进,尤其对于圆柱体(已见和未见物体成功率分别为50%和100%)和球体(已见和未见物体成功率分别为25%和31%),并能成功泛化至未见物体。本工作展示了CFM作为一种数据高效且有效的抓取策略迁移方法,为其他软体机器人系统提供了可扩展的方法论。