We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based on keypoints. Keypoint-based grasp detector from image input has demonstrated promising results in the previous study, where the additional visual information provided by color images compensates for the noisy depth perception. However, it relies heavily on accurately predicting the location of keypoints in the image space. In this paper, we devise a new grasp generation network that reduces the dependency on precise keypoint estimation. Given an RGB-D input, our network estimates both the grasp pose from keypoint detection as well as scale towards the camera. We further re-design the keypoint output space in order to mitigate the negative impact of keypoint prediction noise to Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method outperforms the baseline by a large margin, validating the efficacy of our approach. Finally, despite trained on simple synthetic objects, our method demonstrate sim-to-real capacity by showing competitive results in real-world robot experiments.
翻译:我们提出了一种基于关键点、从2D/2.5D输入进行6-DoF抓取姿态合成的新方法。在先前研究中,基于关键点的图像输入抓取检测器已展现出显著成效,其中彩色图像提供的额外视觉信息弥补了深度感知的噪声问题。然而,该方法高度依赖于对图像空间关键点位置的精确预测。本文设计了一种新型抓取生成网络,降低了对精确关键点估计的依赖。给定RGB-D输入,我们的网络可同时通过关键点检测预测抓取姿态以及相对于相机的尺度。我们进一步重新设计了关键点输出空间,以减轻关键点预测噪声对透视n点(PnP)算法的负面影响。实验表明,所提方法以较大优势超越基线方法,验证了其有效性。最后,尽管仅基于简单合成物体进行训练,我们的方法在真实机器人实验中展现出竞争力,证明了其从仿真到现实的能力。