We present a novel meta-learning approach for 6D pose estimation on unknown objects. In contrast to ``instance-level" and ``category-level" pose estimation methods, our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories. Specifically, we employ a neural process-based meta-learning approach to train an encoder to capture texture and geometry of an object in a latent representation, based on very few RGB-D images and ground-truth keypoints. The latent representation is then used by a simultaneously meta-trained decoder to predict the 6D pose of the object in new images. Furthermore, we propose a novel geometry-aware decoder for the keypoint prediction using a Graph Neural Network (GNN), which explicitly takes geometric constraints specific to each object into consideration. To evaluate our algorithm, extensive experiments are conducted on the \linemod dataset, and on our new fully-annotated synthetic datasets generated from Multiple Categories in Multiple Scenes (MCMS). Experimental results demonstrate that our model performs well on unseen objects with very different shapes and appearances. Remarkably, our model also shows robust performance on occluded scenes although trained fully on data without occlusion. To our knowledge, this is the first work exploring \textbf{cross-category level} 6D pose estimation.
翻译:我们提出了一种新颖的元学习方法,用于未知物体的6D姿态估计。与“实例级”和“类别级”姿态估计方法不同,我们的算法以类别无关的方式学习物体表示,从而赋予其在跨物体类别上的强大泛化能力。具体而言,我们采用基于神经过程的元学习方法,训练编码器基于极少量RGB-D图像和真实关键点,在潜在表示中捕获物体的纹理与几何信息。随后,同步元训练的解码器利用该潜在表示预测新图像中物体的6D姿态。此外,我们提出了一种新颖的几何感知解码器,通过图神经网络(GNN)进行关键点预测,该网络显式考虑了每个物体特有的几何约束。为评估算法性能,我们在LineMOD数据集以及我们新生成的、来自多场景多类别(MCMS)的全标注合成数据集上进行了大量实验。实验结果表明,我们的模型在形状和外观差异巨大的未见物体上表现优异。值得注意的是,尽管模型仅在无遮挡数据上训练,它在遮挡场景中仍展现出鲁棒的性能。据我们所知,这是首项探索跨类别级6D姿态估计的研究。