Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp'D-1M: a large-scale dataset for multi-finger robotic grasping, synthesized with Fast- Grasp'D, a novel differentiable grasping simulator. Grasp'D- 1M contains one million training examples for three robotic hands (three, four and five-fingered), each with multimodal visual inputs (RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp'D is 10x faster than GraspIt! and 20x faster than the prior Grasp'D differentiable simulator. Generated grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance threshold used for contact generation. We validate the usefulness of our dataset by retraining an existing vision-based grasping pipeline on Grasp'D-1M, and showing a dramatic increase in model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35% lower simulated displacement. Additional details at https://dexgrasp.github.io.
翻译:多指抓取依赖于高质量的训练数据,这类数据难以获取:人类数据难以迁移,而合成数据则依赖于降低抓取质量的简化假设。通过使抓取仿真可微、接触动力学适应基于梯度的优化,我们加速了在更少限制假设下对高质量抓取的搜索。我们提出Grasp'D-1M:一个大规模多指机器人抓取数据集,采用新型可微抓取仿真器Fast-Grasp'D合成。Grasp'D-1M包含三种机器人手(三指、四指和五指)的一百万个训练样本,每个样本均配有模态视觉输入(RGB+深度+分割,支持单目和双目)。采用Fast-Grasp'D合成抓取的速度比GraspIt!快10倍,比先前的Grasp'D可微仿真器快20倍。无论用于接触生成的距离阈值如何,生成的抓取比GraspIt!抓取更稳定且接触更丰富。我们通过在Grasp'D-1M上重新训练现有基于视觉的抓取流水线来验证数据集的实用性,结果表明模型性能显著提升:预测的抓取接触增加30%,epsilon指标提高33%,模拟位移降低35%。更多详情请访问https://dexgrasp.github.io。