We present a novel technique to estimate the 6D pose of objects from single images where the 3D geometry of the object is only given approximately and not as a precise 3D model. To achieve this, we employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel. In addition to the 3D coordinates, our model also estimates the pixel-wise coordinate error to discard correspondences that are likely wrong. This allows us to generate multiple 6D pose hypotheses of the object, which we then refine iteratively using a highly efficient region-based approach. We also introduce a novel pixel-wise posterior formulation by which we can estimate the probability for each hypothesis and select the most likely one. As we show in experiments, our approach is capable of dealing with extreme visual conditions including overexposure, high contrast, or low signal-to-noise ratio. This makes it a powerful technique for the particularly challenging task of estimating the pose of tumbling satellites for in-orbit robotic applications. Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.
翻译:我们提出一种新颖技术,用于从单张图像估计物体的六自由度姿态,其中物体的三维几何形状仅以近似形式给出,而非精确的3D模型。为实现该目标,我们采用密集的2D到3D对应关系预测器,为每个像素回归3D模型坐标。除3D坐标外,我们的模型还估计像素级坐标误差,以剔除可能错误的对应关系。这使得我们能够生成物体的多个六自由度姿态假设,并通过高效的区域迭代方法对其进行精化。同时,我们引入一种新颖的像素级后验公式,用于估计每个假设的概率并选择最可能的一个。实验表明,该方法能应对极端视觉条件,包括过曝、高对比度或低信噪比。这使其成为在轨机器人应用中极具挑战性的翻滚卫星姿态估计任务的强大技术。我们的方法在SPEED+数据集上达到最优性能,并赢得了SPEC2021赛后竞赛。