Pose refinement is an interesting and practically relevant research direction. Pose refinement can be used to (1) obtain a more accurate pose estimate from an initial prior (e.g., from retrieval), (2) as pre-processing, i.e., to provide a better starting point to a more expensive pose estimator, (3) as post-processing of a more accurate localizer. Existing approaches focus on learning features / scene representations for the pose refinement task. This involves training an implicit scene representation or learning features while optimizing a camera pose-based loss. A natural question is whether training specific features / representations is truly necessary or whether similar results can be already achieved with more generic features. In this work, we present a simple approach that combines pre-trained features with a particle filter and a renderable representation of the scene. Despite its simplicity, it achieves state-of-the-art results, demonstrating that one can easily build a pose refiner without the need for specific training. The code is at https://github.com/ga1i13o/mcloc_poseref
翻译:位姿精化是一个有趣且具有实际意义的研究方向。位姿精化可用于:(1) 从初始先验(例如通过检索)获得更精确的位姿估计;(2) 作为预处理步骤,即为更昂贵的位姿估计器提供更好的初始点;(3) 作为更精确定位算法的后处理步骤。现有方法侧重于为位姿精化任务学习特征/场景表示,这需要训练隐式场景表示或优化基于摄像机位姿损失的特征学习。一个自然而然的问题是:训练特定特征/表示是否真正必要,还是说使用更通用的特征即可获得类似结果?在本工作中,我们提出了一种简单方法,将预训练特征与粒子滤波器和可渲染场景表示相结合。尽管方法简单,它却实现了最先进的结果,证明无需专门训练即可轻松构建位姿精化器。代码地址:https://github.com/ga1i13o/mcloc_poseref