Multi-resolution hash encoding has recently been proposed to reduce the computational cost of neural renderings, such as NeRF. This method requires accurate camera poses for the neural renderings of given scenes. However, contrary to previous methods jointly optimizing camera poses and 3D scenes, the naive gradient-based camera pose refinement method using multi-resolution hash encoding severely deteriorates performance. We propose a joint optimization algorithm to calibrate the camera pose and learn a geometric representation using efficient multi-resolution hash encoding. Showing that the oscillating gradient flows of hash encoding interfere with the registration of camera poses, our method addresses the issue by utilizing smooth interpolation weighting to stabilize the gradient oscillation for the ray samplings across hash grids. Moreover, the curriculum training procedure helps to learn the level-wise hash encoding, further increasing the pose refinement. Experiments on the novel-view synthesis datasets validate that our learning frameworks achieve state-of-the-art performance and rapid convergence of neural rendering, even when initial camera poses are unknown.
翻译:多分辨率哈希编码最近被提出用于降低神经渲染(如NeRF)的计算成本。该方法需要准确的相机位姿以完成给定场景的神经渲染。然而,与先前联合优化相机位姿与三维场景的方法不同,基于多分辨率哈希编码的朴素梯度相机位姿优化方法会严重降低性能。我们提出了一种联合优化算法,利用高效的多分辨率哈希编码校准相机位姿并学习几何表示。通过揭示哈希编码的振荡梯度流会干扰相机位姿配准,我们的方法采用平滑插值权重来稳定跨哈希网格光线采样的梯度振荡。此外,课程训练流程有助于逐级学习哈希编码,进一步提升位姿优化效果。在新视角合成数据集上的实验验证表明,即使初始相机位姿未知,我们的学习框架也能实现神经渲染的最先进性能与快速收敛。