This paper presents GenH2R, a framework for learning generalizable vision-based human-to-robot (H2R) handover skills. The goal is to equip robots with the ability to reliably receive objects with unseen geometry handed over by humans in various complex trajectories. We acquire such generalizability by learning H2R handover at scale with a comprehensive solution including procedural simulation assets creation, automated demonstration generation, and effective imitation learning. We leverage large-scale 3D model repositories, dexterous grasp generation methods, and curve-based 3D animation to create an H2R handover simulation environment named \simabbns, surpassing the number of scenes in existing simulators by three orders of magnitude. We further introduce a distillation-friendly demonstration generation method that automatically generates a million high-quality demonstrations suitable for learning. Finally, we present a 4D imitation learning method augmented by a future forecasting objective to distill demonstrations into a visuo-motor handover policy. Experimental evaluations in both simulators and the real world demonstrate significant improvements (at least +10\% success rate) over baselines in all cases. The project page is https://GenH2R.github.io/.
翻译:本文提出了GenH2R框架,用于学习基于视觉的通用人-机器人(H2R)交接技能。目标是使机器人能够可靠地接收人类以各种复杂轨迹递送的几何形状未知的物体。我们通过一套包括程序化仿真资产创建、自动演示生成和高效模仿学习的综合方案,在规模化学习中实现H2R交接的通用性。我们利用大规模3D模型库、灵巧抓取生成方法及基于曲线的3D动画,构建了名为\simabbns的H2R交接仿真环境,其场景数量比现有仿真器高出三个数量级。我们还提出了一种易于蒸馏的演示生成方法,可自动生成百万级适合学习的高质量演示样本。最后,我们提出一种增强未来预测目标的4D模仿学习方法,将演示样本蒸馏为视觉-运动交接策略。在仿真和真实环境中的实验表明,本方法在所有场景中均比基线方法有显著改进(至少提升10%的成功率)。项目页面:https://GenH2R.github.io/。