We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative "templates", rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest-neighbor search in feature space, results in a speedup factor of 35x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with existing refinement methods. Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient. Our source code and trained models are publicly available at https://github.com/nv-nguyen/gigaPose
翻译:我们提出了GigaPose,一种基于CAD模型、在RGB图像中进行快速、鲁棒且准确的新型物体姿态估计方法。GigaPose首先利用具有判别性的“模板”(即CAD模型的渲染图像)来恢复面外旋转,然后通过补丁对应估计剩余四个参数。该方法仅在二维自由度空间(而非通常的三维空间)中采样模板,并借助特征空间中的快速最近邻搜索实现输入图像与模板的匹配,与现有最先进技术相比,速度提升达35倍。此外,GigaPose对分割错误具有显著更强的鲁棒性。我们在BOP挑战赛的七个核心数据集上进行了广泛评估,结果表明该方法达到了最先进的精度,且可无缝集成现有的精化方法。同时,我们展示了GigaPose结合最新单图像三维重建方法所预测的3D模型的潜力,从而降低了对CAD模型的依赖,使六自由度物体姿态估计更加便捷。我们的源代码与训练模型已在https://github.com/nv-nguyen/gigaPose 公开提供。