We present a method named iComMa to address the 6D camera pose estimation problem in computer vision. Conventional pose estimation methods typically rely on the target's CAD model or necessitate specific network training tailored to particular object classes. Some existing methods have achieved promising results in mesh-free object and scene pose estimation by inverting the Neural Radiance Fields (NeRF). However, they still struggle with adverse initializations such as large rotations and translations. To address this issue, we propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS). Specifically, a gradient-based differentiable framework optimizes camera pose by minimizing the residual between the query image and the rendered image, requiring no training. An end-to-end matching module is designed to enhance the model's robustness against adverse initializations, while minimizing pixel-level comparing loss aids in precise pose estimation. Experimental results on synthetic and complex real-world data demonstrate the effectiveness of the proposed approach in challenging conditions and the accuracy of camera pose estimation.
翻译:本文提出一种名为iComMa的方法,用于解决计算机视觉中的6D相机位姿估计问题。传统位姿估计方法通常依赖目标的CAD模型,或需要对特定物体类别进行专门的网络训练。现有部分方法通过反演神经辐射场(NeRF),在无网格物体与场景的位姿估计中取得了显著成果,但在大旋转和大平移等不利初始化条件下仍存在困难。为解决该问题,我们提出一种通过反演3D高斯点云(3DGS)实现精确相机位姿估计的高效方法。具体而言,基于梯度的可微框架通过最小化查询图像与渲染图像之间的残差来优化相机位姿,无需训练。我们设计了一个端到端匹配模块,以增强模型对不利初始化的鲁棒性,同时最小化像素级比较损失有助于实现精确的位姿估计。在合成数据与复杂真实场景数据上的实验结果表明,该方法在挑战性条件下具有有效性,并能实现高精度的相机位姿估计。