We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF model. From these sampled points, we cast rays and deduce the color for each ray through pixel-level view synthesis. The camera pose can then be estimated as the solution to a Least Squares problem by selecting correspondences between the query image and the resulting bundle. We facilitate this process through a learned attention mechanism, bridging the query image embedding with the embedding of parameterized rays, thereby matching rays pertinent to the image. Through synthetic and real evaluation settings, we show that our method can improve the angular and translation error accuracy by 80.1% and 67.3%, respectively, compared to iNeRF while performing at 34fps on consumer hardware and not requiring the initial pose guess.
翻译:我们提出IFFNeRF方法,用于估计给定图像的六自由度(6DoF)相机位姿,该方法基于神经辐射场(NeRF)框架构建。IFFNeRF专为实时运行设计,无需提供接近目标解的初始位姿猜测。该方法采用Metropolis-Hasting算法从NeRF模型中采样表面点。基于这些采样点,我们发射光线并通过像素级视图合成推导每条光线的颜色。通过选择查询图像与生成光线束间的对应关系,可将相机位姿估计转化为最小二乘问题的求解。我们引入学习型注意力机制促进该过程,通过桥接查询图像嵌入与参数化光线嵌入,匹配与图像相关的光线。在合成数据集与真实场景的评估中,相比iNeRF方法,本方法在消费级硬件上以34fps的速率运行,无需初始位姿猜测,将角度误差和位移误差精度分别提升80.1%和67.3%。