This paper proposes a new method for accurate and robust 6D pose estimation of novel objects, named GS2Pose. By introducing 3D Gaussian splatting, GS2Pose can utilize the reconstruction results without requiring a high-quality CAD model, which means it only requires segmented RGBD images as input. Specifically, GS2Pose employs a two-stage structure consisting of coarse estimation followed by refined estimation. In the coarse stage, a lightweight U-Net network with a polarization attention mechanism, called Pose-Net, is designed. By using the 3DGS model for supervised training, Pose-Net can generate NOCS images to compute a coarse pose. In the refinement stage, GS2Pose formulates a pose regression algorithm following the idea of reprojection or Bundle Adjustment (BA), referred to as GS-Refiner. By leveraging Lie algebra to extend 3DGS, GS-Refiner obtains a pose-differentiable rendering pipeline that refines the coarse pose by comparing the input images with the rendered images. GS-Refiner also selectively updates parameters in the 3DGS model to achieve environmental adaptation, thereby enhancing the algorithm's robustness and flexibility to illuminative variation, occlusion, and other challenging disruptive factors. GS2Pose was evaluated through experiments conducted on the LineMod dataset, where it was compared with similar algorithms, yielding highly competitive results. The code for GS2Pose will soon be released on GitHub.
翻译:本文提出了一种名为GS2Pose的新方法,用于实现新颖物体的精确且鲁棒的6D姿态估计。通过引入3D高斯溅射技术,GS2Pose能够利用重建结果,而无需高质量CAD模型,这意味着它仅需分割后的RGBD图像作为输入。具体而言,GS2Pose采用由粗估计和精估计构成的两阶段结构。在粗估计阶段,设计了一个带有极化注意力机制的轻量级U-Net网络,称为Pose-Net。通过使用3DGS模型进行监督训练,Pose-Net能够生成NOCS图像以计算粗略姿态。在精炼阶段,GS2Pose遵循重投影或光束法平差的思想,构建了一种姿态回归算法,称为GS-Refiner。通过利用李代数扩展3DGS,GS-Refiner获得了一个姿态可微的渲染流程,通过比较输入图像与渲染图像来优化粗略姿态。GS-Refiner还选择性地更新3DGS模型中的参数,以实现环境适应,从而增强算法对光照变化、遮挡及其他挑战性干扰因素的鲁棒性和灵活性。通过在LineMod数据集上进行的实验对GS2Pose进行了评估,并与同类算法进行了比较,取得了极具竞争力的结果。GS2Pose的代码即将在GitHub上发布。