Although various visual localization approaches exist, such as scene coordinate and pose regression, these methods often struggle with high memory consumption or extensive optimization requirements. To address these challenges, we utilize recent advancements in novel view synthesis, particularly 3D Gaussian Splatting (3DGS), to enhance localization. 3DGS allows for the compact encoding of both 3D geometry and scene appearance with its spatial features. Our method leverages the dense description maps produced by XFeat's lightweight keypoint detection and description model. We propose distilling these dense keypoint descriptors into 3DGS to improve the model's spatial understanding, leading to more accurate camera pose predictions through 2D-3D correspondences. After estimating an initial pose, we refine it using a photometric warping loss. Benchmarking on popular indoor and outdoor datasets shows that our approach surpasses state-of-the-art Neural Render Pose (NRP) methods, including NeRFMatch and PNeRFLoc.
翻译:尽管存在多种视觉定位方法,例如场景坐标回归与姿态回归,但这些方法往往面临高内存消耗或大量优化需求的问题。为应对这些挑战,我们利用新颖视图合成领域的最新进展,特别是3D高斯泼溅(3DGS),来增强定位性能。3DGS能够通过其空间特征紧凑地编码三维几何与场景外观。我们的方法利用XFeat轻量级关键点检测与描述模型生成的稠密描述图。我们提出将这些稠密关键点描述符蒸馏至3DGS中,以提升模型的空间理解能力,进而通过2D-3D对应关系实现更精确的相机姿态预测。在估计初始姿态后,我们采用光度扭曲损失对其进行精细化处理。在主流室内外数据集上的测试表明,我们的方法超越了包括NeRFMatch和PNeRFLoc在内的最先进神经渲染姿态(NRP)方法。