Estimating the 6D object pose is an essential task in many applications. Due to the lack of depth information, existing RGB-based methods are sensitive to occlusion and illumination changes. How to extract and utilize the geometry features in depth information is crucial to achieve accurate predictions. To this end, we propose TransPose, a novel 6D pose framework that exploits Transformer Encoder with geometry-aware module to develop better learning of point cloud feature representations. Specifically, we first uniformly sample point cloud and extract local geometry features with the designed local feature extractor base on graph convolution network. To improve robustness to occlusion, we adopt Transformer to perform the exchange of global information, making each local feature contains global information. Finally, we introduce geometry-aware module in Transformer Encoder, which to form an effective constrain for point cloud feature learning and makes the global information exchange more tightly coupled with point cloud tasks. Extensive experiments indicate the effectiveness of TransPose, our pose estimation pipeline achieves competitive results on three benchmark datasets.
翻译:在众多应用中,估计6D物体姿态是一项关键任务。由于缺乏深度信息,现有基于RGB的方法对遮挡和光照变化较为敏感。如何提取并利用深度信息中的几何特征对于实现精确预测至关重要。为此,我们提出TransPose——一种新颖的6D姿态框架,该框架利用带有几何感知模块的Transformer编码器来更好地学习点云特征表示。具体而言,我们首先对点云进行均匀采样,并通过基于图卷积网络设计的局部特征提取器提取局部几何特征。为提升对遮挡的鲁棒性,我们采用Transformer进行全局信息交换,使每个局部特征均包含全局信息。最后,我们在Transformer编码器中引入几何感知模块,该模块可对点云特征学习形成有效约束,并使全局信息交换与点云任务更紧密耦合。大量实验表明TransPose的有效性,我们的姿态估计流程在三个基准数据集上均取得了具有竞争力的结果。