Neural radiance fields (NeRFs) are a powerful tool for implicit scene representations, allowing for differentiable rendering and the ability to make predictions about unseen viewpoints. There has been growing interest in object and scene-based localisation using NeRFs, with a number of recent works relying on sampling-based or Monte-Carlo localisation schemes. Unfortunately, these can be extremely computationally expensive, requiring multiple network forward passes to infer camera or object pose. To alleviate this, a variety of sampling strategies have been applied, many relying on keypoint recognition techniques from classical computer vision. This work conducts a systematic empirical comparison of these approaches and shows that in contrast to conventional feature matching approaches for geometry-based localisation, sampling-based localisation using NeRFs benefits significantly from stable features. Results show that rendering stable features provides significantly better estimation with a tenfold reduction in the number of forward passes required.
翻译:神经辐射场(NeRFs)是隐式场景表示的强大工具,支持可微分渲染并能够对未见视角进行预测。利用NeRF进行物体和场景定位的研究日益增多,近期多项工作依赖于基于采样或蒙特卡洛的定位方案。然而,这些方法计算成本极高,需要多次网络前向传播来推断相机或物体位姿。为缓解此问题,多种采样策略被采用,其中许多依赖于经典计算机视觉中的关键点识别技术。本研究对这些方法进行了系统的实证比较,结果表明:与基于几何定位的传统特征匹配方法不同,基于NeRF的采样定位方法能够从稳定特征中显著获益。实验数据显示,渲染稳定特征可在所需前向传播次数减少十倍的同时,显著提升位姿估计精度。