Neural Radiance Fields (NeRF) can be optimized to obtain high-fidelity 3D scene reconstructions of objects and large-scale scenes. However, NeRFs require accurate camera parameters as input -- inaccurate camera parameters result in blurry renderings. Extrinsic and intrinsic camera parameters are usually estimated using Structure-from-Motion (SfM) methods as a pre-processing step to NeRF, but these techniques rarely yield perfect estimates. Thus, prior works have proposed jointly optimizing camera parameters alongside a NeRF, but these methods are prone to local minima in challenging settings. In this work, we analyze how different camera parameterizations affect this joint optimization problem, and observe that standard parameterizations exhibit large differences in magnitude with respect to small perturbations, which can lead to an ill-conditioned optimization problem. We propose using a proxy problem to compute a whitening transform that eliminates the correlation between camera parameters and normalizes their effects, and we propose to use this transform as a preconditioner for the camera parameters during joint optimization. Our preconditioned camera optimization significantly improves reconstruction quality on scenes from the Mip-NeRF 360 dataset: we reduce error rates (RMSE) by 67% compared to state-of-the-art NeRF approaches that do not optimize for cameras like Zip-NeRF, and by 29% relative to state-of-the-art joint optimization approaches using the camera parameterization of SCNeRF. Our approach is easy to implement, does not significantly increase runtime, can be applied to a wide variety of camera parameterizations, and can straightforwardly be incorporated into other NeRF-like models.
翻译:神经辐射场(NeRF)可通过优化获得物体和大规模场景的高保真三维场景重建。然而,NeRF需要准确的相机参数作为输入——不准确的相机参数会导致渲染模糊。通常使用运动恢复结构(SfM)方法作为NeRF的预处理步骤来估计相机内外参数,但这些技术很少能产生完美估计。因此,先前研究提出在优化NeRF的同时联合优化相机参数,但这些方法在挑战性场景中容易陷入局部最小值。本文分析了不同相机参数化方式如何影响这一联合优化问题,并观察到标准参数化在微小扰动下存在幅度差异较大的现象,这可能导致病态优化问题。我们提出使用代理问题来计算白化变换,以消除相机参数之间的相关性并归一化其影响,并建议在联合优化中将此变换作为相机参数的预条件子。我们的预条件相机优化显著提升了Mip-NeRF 360数据集中场景的重建质量:与不优化相机的顶尖NeRF方法(如Zip-NeRF)相比,我们将误差率(RMSE)降低了67%;与使用SCNeRF相机参数化的顶尖联合优化方法相比,误差率降低了29%。该方法易于实现,不会显著增加运行时间,可适用于多种相机参数化方式,并能直接集成到其他类NeRF模型中。