We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative templates, rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest neighbor search in feature space, results in a speedup factor of 38x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with a refinement method. Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient. Our source code and trained models are publicly available at https://github.com/nv-nguyen/gigaPose
翻译:我们提出GigaPose,一种基于CAD模型在RGB图像中进行快速、鲁棒且精确的新型物体姿态估计方法。GigaPose首先利用判别性模板(CAD模型的渲染图像)恢复面外旋转,然后通过块对应关系估计其余四个参数。我们的方法仅在二维自由度空间而非通常的三维空间中采样模板,并通过特征空间中的快速最近邻搜索将输入图像与模板进行匹配,相比现有最先进方法实现了38倍的加速比。此外,GigaPose对分割误差具有显著更强的鲁棒性。在BOP挑战赛的七个核心数据集上的广泛评估表明,该方法达到了最先进的精度,并可无缝集成精化方法。我们还展示了GigaPose与基于单张图像三维重建的最新方法预测的3D模型结合的潜力,从而降低了对CAD模型的依赖,使6D物体姿态估计更加便捷。我们的源代码和预训练模型已公开于https://github.com/nv-nguyen/gigaPose。