In this paper, we propose a novel encoder-decoder architecture, named SABER, to learn the 6D pose of the object in the embedding space by learning shape representation at a given pose. This model enables us to learn pose by performing shape representation at a target pose from RGB image input. We perform shape representation as an auxiliary task which helps us in learning rotations space for an object based on 2D images. An image encoder predicts the rotation in the embedding space and the DeepSDF based decoder learns to represent the object's shape at the given pose. As our approach is shape based, the pipeline is suitable for any type of object irrespective of the symmetry. Moreover, we need only a CAD model of the objects to train SABER. Our pipeline is synthetic data based and can also handle symmetric objects without symmetry labels and, thus, no additional labeled training data is needed. The experimental evaluation shows that our method achieves close to benchmark results for both symmetric objects and asymmetric objects on Occlusion-LineMOD, and T-LESS datasets.
翻译:本文提出一种新颖的编码器-解码器架构SABER,通过在给定姿态下学习形状表示,在嵌入空间中学习物体的六维姿态。该模型能够从RGB图像输入出发,在目标姿态下执行形状表示来学习姿态。我们将形状表示作为辅助任务,这有助于基于二维图像学习物体的旋转空间。图像编码器在嵌入空间中预测旋转,基于DeepSDF的解码器则学习在给定姿态下表示物体的形状。由于我们的方法基于形状,该流程适用于任何类型的物体,且不受对称性影响。此外,我们仅需要物体的CAD模型来训练SABER。我们的流程基于合成数据,无需对称性标签即可处理对称物体,因此不需要额外的标注训练数据。实验评估表明,在Occlusion-LineMOD和T-LESS数据集上,我们的方法在对称物体和非对称物体上均取得了接近基准测试的结果。