Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object's 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e.g., RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings.
翻译:机器人操作,特别是手内物体操作,通常需要精确估计物体的6D姿态。为提高估计姿态的准确性,当前最先进的6D物体姿态估计方法利用来自一个或多个模态(例如RGB图像、深度数据和触觉读数)的观测数据。然而,现有方法对这些模态所捕捉的物体内在几何结构利用有限,从而增加了对视觉特征的依赖。这导致当处理缺乏此类视觉特征或视觉特征被遮挡的物体时,性能表现不佳。此外,当前方法未充分利用嵌入在手指位置中的本体感知信息。为解决这些局限性,本文:(1)提出一种用于融合多模态(视觉与触觉)数据的层级图神经网络架构,以实现基于几何信息的6D物体姿态估计;(2)引入一种层级消息传递操作,在模态内部及跨模态间流动信息,以学习基于图的物体表征;(3)提出一种考虑本体感知信息的手内物体表征方法。我们在YCB物体与模型集中选取的多样化物体子集上评估模型,结果表明,我们的方法在精度和对遮挡的鲁棒性上显著优于现有最先进工作。同时,我们将所提框架部署于真实机器人,并定性验证了其在真实场景中的成功迁移。