In this work, we tackle 6-DoF grasp detection for transparent and specular objects, which is an important yet challenging problem in vision-based robotic systems, due to the failure of depth cameras in sensing their geometry. We, for the first time, propose a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, that leverages the generalizable neural radiance field (NeRF) to achieve material-agnostic object grasping in clutter. Compared to the existing NeRF-based 3-DoF grasp detection methods that rely on densely captured input images and time-consuming per-scene optimization, our system can perform zero-shot NeRF construction with sparse RGB inputs and reliably detect 6-DoF grasps, both in real-time. The proposed framework jointly learns generalizable NeRF and grasp detection in an end-to-end manner, optimizing the scene representation construction for the grasping. For training data, we generate a large-scale photorealistic domain-randomized synthetic dataset of grasping in cluttered tabletop scenes that enables direct transfer to the real world. Our extensive experiments in synthetic and real-world environments demonstrate that our method significantly outperforms all the baselines in all the experiments while remaining in real-time. Project page can be found at https://pku-epic.github.io/GraspNeRF
翻译:本文针对透明与镜面物体的6自由度抓取检测问题展开研究。由于深度相机在感知此类物体几何结构时存在失效问题,该任务在基于视觉的机器人系统中虽至关重要却极具挑战性。我们首次提出基于多视图RGB输入的6自由度抓取检测网络GraspNeRF,通过可泛化神经辐射场(NeRF)实现对杂波环境中材料无关的物体抓取。与现有依赖密集采集输入图像和耗时逐场景优化的NeRF-based 3自由度抓取检测方法不同,本系统可基于稀疏RGB输入执行零样本NeRF构建,并实时可靠检测6自由度抓取,两者均可在线完成。所提框架以端到端方式联合学习可泛化NeRF与抓取检测,并针对抓取任务优化场景表征构建。在训练数据方面,我们生成了大规模逼真的域随机化合成数据集,模拟杂波桌面场景中的抓取过程,该数据集可直接迁移至真实世界。在合成环境与真实世界中的大量实验表明,本方法在所有实验中显著优于所有基线方法,且始终保持实时性能。项目页面见https://pku-epic.github.io/GraspNeRF