Visual relocalization is a key technique to autonomous driving, robotics, and virtual/augmented reality. After decades of explorations, absolute pose regression (APR), scene coordinate regression (SCR), and hierarchical methods (HMs) have become the most popular frameworks. However, in spite of high efficiency, APRs and SCRs have limited accuracy especially in large-scale outdoor scenes; HMs are accurate but need to store a large number of 2D descriptors for matching, resulting in poor efficiency. In this paper, we propose an efficient and accurate framework, called VRS-NeRF, for visual relocalization with sparse neural radiance field. Precisely, we introduce an explicit geometric map (EGM) for 3D map representation and an implicit learning map (ILM) for sparse patches rendering. In this localization process, EGP provides priors of spare 2D points and ILM utilizes these sparse points to render patches with sparse NeRFs for matching. This allows us to discard a large number of 2D descriptors so as to reduce the map size. Moreover, rendering patches only for useful points rather than all pixels in the whole image reduces the rendering time significantly. This framework inherits the accuracy of HMs and discards their low efficiency. Experiments on 7Scenes, CambridgeLandmarks, and Aachen datasets show that our method gives much better accuracy than APRs and SCRs, and close performance to HMs but is much more efficient.
翻译:视觉重定位是自动驾驶、机器人学及虚拟/增强现实领域的关键技术。经过数十年探索,绝对位姿回归(APR)、场景坐标回归(SCR)与分层方法(HMs)已成为主流框架。然而,APRs与SCRs虽具备高效率,但在大规模室外场景中精度有限;HMs虽精确,却需存储大量二维描述子用于匹配,导致效率低下。本文提出一种高效且精确的框架——VRS-NeRF,通过稀疏神经辐射场实现视觉重定位。具体而言,我们引入显式几何地图(EGM)进行三维地图表征,并构建隐式学习地图(ILM)以渲染稀疏图像块。在定位过程中,EGM提供稀疏二维点的先验信息,ILM利用这些稀疏点通过稀疏NeRF渲染图像块以进行匹配。这使我们能够舍弃大量二维描述子,从而缩减地图规模。此外,仅对有效点而非全图像素进行图像块渲染,显著降低了渲染时间。该框架继承了HMs的精度优势,同时克服了其低效性。在7Scenes、CambridgeLandmarks及Aachen数据集上的实验表明,本方法精度显著优于APRs与SCRs,性能接近HMs但效率大幅提升。