We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based on keypoints. Keypoint-based grasp detector from image input has demonstrated promising results in the previous study, where the additional visual information provided by color images compensates for the noisy depth perception. However, it relies heavily on accurately predicting the location of keypoints in the image space. In this paper, we devise a new grasp generation network that reduces the dependency on precise keypoint estimation. Given an RGB-D input, our network estimates both the grasp pose from keypoint detection as well as scale towards the camera. We further re-design the keypoint output space in order to mitigate the negative impact of keypoint prediction noise to Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method outperforms the baseline by a large margin, validating the efficacy of our approach. Finally, despite trained on simple synthetic objects, our method demonstrate sim-to-real capacity by showing competitive results in real-world robot experiments.
翻译:我们提出了一种基于关键点的2D/2.5D输入到6自由度抓取姿态合成新方法。此前研究表明,基于关键点的图像输入抓取检测器取得了令人瞩目的成果,其中彩色图像提供的额外视觉信息能够补偿噪声深度感知的不足。然而,该方法高度依赖图像空间中关键点位置的精确预测。本文设计了一种新型抓取生成网络,降低了对精确关键点估计的依赖。给定RGB-D输入,该网络既能通过关键点检测估计抓取姿态,又能估计朝向相机的尺度。我们进一步重新设计了关键点输出空间,以减轻关键点预测噪声对透视n点(Perspective-n-Point, PnP)算法的负面影响。实验表明,所提方法显著优于基线,验证了其有效性。最后,尽管仅在简单合成物体上训练,该方法在真实机器人实验中展现了竞争力,证明了其从仿真到现实的迁移能力。