6D pose estimation aims at determining the object pose that best explains the camera observation. The unique solution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or when occlusions of symmetry-breaking elements happen, depending on the viewpoint. Currently, 6D pose estimation methods are benchmarked on datasets that consider, for their ground truth annotations, visual ambiguities as only related to global object symmetries, whereas they should be defined per-image to account for the camera viewpoint. We thus first propose an automatic method to re-annotate those datasets with a 6D pose distribution specific to each image, taking into account the object surface visibility in the image to correctly determine the visual ambiguities. Second, given this improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifies the ranking of these methods. Third, as some recent works focus on estimating the complete set of solutions, we derive a precision/recall formulation to evaluate them against our image-wise distribution ground truth, making it the first benchmark for pose distribution methods on real images.
翻译:6D姿态估计旨在确定能够最佳解释相机观测结果的物体姿态。对于非模糊物体,其唯一解可能转变为多模态姿态分布,这种情况发生在对称物体或对称破坏元素被遮挡时,具体取决于观察视角。当前,6D姿态估计方法在数据集上进行基准测试时,其真实标注仅将视觉模糊性视为与物体全局对称性相关,而实际上应根据每幅图像的相机视角来定义。因此,我们首先提出一种自动方法,为这些数据集重新标注每幅图像特定的6D姿态分布,该方法通过考虑物体表面在图像中的可见性来准确判定视觉模糊性。其次,基于这种改进的真实标注,我们重新评估了当前最先进的单姿态估计方法,结果表明这显著改变了这些方法的排名。最后,针对近期关注完整解集估计的研究工作,我们推导出精确率/召回率计算公式,使其能够根据我们按图像分布的真实标注进行评估,从而创建了首个针对真实图像的姿态分布方法基准测试体系。