Implicit neural representation has emerged as a powerful method for reconstructing 3D scenes from 2D images. Given a set of camera poses and associated images, the models can be trained to synthesize novel, unseen views. In order to expand the use cases for implicit neural representations, we need to incorporate camera pose estimation capabilities as part of the representation learning, as this is necessary for reconstructing scenes from real-world video sequences where cameras are generally not being tracked. Existing approaches like COLMAP and, most recently, bundle-adjusting neural radiance field methods often suffer from lengthy processing times. These delays ranging from hours to days, arise from laborious feature matching, hardware limitations, dense point sampling, and long training times required by a multi-layer perceptron structure with a large number of parameters. To address these challenges, we propose a framework called bundle-adjusting accelerated neural graphics primitives (BAA-NGP). Our approach leverages accelerated sampling and hash encoding to expedite both pose refinement/estimation and 3D scene reconstruction. Experimental results demonstrate that our method achieves a more than 10 to 20 $\times$ speed improvement in novel view synthesis compared to other bundle-adjusting neural radiance field methods without sacrificing the quality of pose estimation. The github repository can be found here https://github.com/IntelLabs/baa-ngp.
翻译:隐式神经表示已成为从二维图像重建三维场景的强大方法。给定一组相机姿态及相关图像,模型可被训练以合成未见过的全新视角。为拓展隐式神经表示的应用场景,我们需要将相机姿态估计能力纳入表示学习中,这对于从真实视频序列重建场景至关重要(此类视频中相机通常未被追踪)。现有方法如COLMAP及最近的束调整神经辐射场方法,常面临处理时间过长的问题。这些延迟从数小时到数天不等,源于繁琐的特征匹配、硬件限制、密集点采样以及含大量参数的多层感知机结构所需的长时间训练。为解决这些挑战,我们提出名为“束调整加速神经图形基元”(BAA-NGP)的框架。该方案利用加速采样与哈希编码技术,同时加快姿态优化/估计与三维场景重建进程。实验结果表明,与其它束调整神经辐射场方法相比,本方法在新视角合成任务中实现10至20倍以上的速度提升,且未牺牲姿态估计质量。相关GitHub代码仓库可于https://github.com/IntelLabs/baa-ngp获取。