Existing visual localization methods are typically either 2D image-based, which are easy to build and maintain but limited in effective geometric reasoning, or 3D structure-based, which achieve high accuracy but require a centralized reconstruction and are difficult to update. In this work, we revisit visual localization with a 2D image-based representation and propose to augment each image with estimated depth maps to capture the geometric structure. Supported by the effective use of dense matchers, this representation is not only easy to build and maintain, but achieves highest accuracy in challenging conditions. With compact compression and a GPU-accelerated LO-RANSAC implementation, the whole pipeline is efficient in both storage and computation and allows for a flexible trade-off between accuracy and highest memory efficiency. Our method achieves a new state-of-the-art accuracy on various standard benchmarks and outperforms existing memory-efficient methods at comparable map sizes. Code will be available at https://github.com/cvg/Hierarchical-Localization.
翻译:现有可视化定位方法通常分为两类:一类是基于二维图像的,这类方法易于构建和维护,但在有效几何推理方面存在局限;另一类是基于三维结构的,这类方法虽能达到较高精度,但需要集中式重建且难以更新。本研究采用基于二维图像的表示方法重新审视可视化定位问题,并提出通过为每幅图像添加估计深度图来捕捉几何结构。借助密集匹配器的有效运用,该表示方法不仅易于构建和维护,还能在挑战性条件下实现最高精度。通过紧凑压缩和GPU加速的LO-RANSAC实现,整个流程在存储和计算方面均表现出高效性,并能在精度与内存效率之间实现灵活权衡。本方法在多个标准基准测试中取得了新的最优精度,并在可比地图规模下超越了现有内存高效方法。代码将在https://github.com/cvg/Hierarchical-Localization发布。