In this letter, we introduce ViHOPE, a novel framework for estimating the 6D pose of an in-hand object using visuotactile perception. Our key insight is that the accuracy of the 6D object pose estimate can be improved by explicitly completing the shape of the object. To this end, we introduce a novel visuotactile shape completion module that uses a conditional Generative Adversarial Network to complete the shape of an in-hand object based on volumetric representation. This approach improves over prior works that directly regress visuotactile observations to a 6D pose. By explicitly completing the shape of the in-hand object and jointly optimizing the shape completion and pose estimation tasks, we improve the accuracy of the 6D object pose estimate. We train and test our model on a synthetic dataset and compare it with the state-of-the-art. In the visuotactile shape completion task, we outperform the state-of-the-art by 265% using the Intersection of Union metric and achieve 88% lower Chamfer Distance. In the visuotactile pose estimation task, we present results that suggest our framework reduces position and angular errors by 35% and 64%, respectively. Furthermore, we ablate our framework to confirm the gain on the 6D object pose estimate from explicitly completing the shape. Ultimately, we show that our framework produces models that are robust to sim-to-real transfer on a real-world robot platform.
翻译:本文提出ViHOPE,一种利用视触觉感知估计手持物体6D姿态的新型框架。我们的核心洞察在于:通过显式补全物体形状可提升6D姿态估计精度。为此,我们引入一个新颖的视触觉形状补全模块,该模块基于体素表征,采用条件生成对抗网络对手持物体形状进行补全。该方法优于现有直接通过视触觉观测回归6D姿态的方案。通过显式补全手持物体形状并联合优化形状补全与姿态估计任务,我们提升了物体6D姿态估计的准确性。我们在合成数据集上训练和测试模型,并与当前最优方法进行对比。在视触觉形状补全任务中,我们在交并比指标上超越现有最优方法265%,并实现查默距离降低88%。在视触觉姿态估计任务中,实验结果表明本框架分别将位置误差和角度误差降低了35%和64%。此外,我们通过消融实验证实了显式形状补全对6D物体姿态估计的增益效果。最终,我们在真实机器人平台上验证了本框架对仿真到现实迁移的鲁棒性。