Neural radiance fields enable novel-view synthesis and scene reconstruction with photorealistic quality from a few images, but require known and accurate camera poses. Conventional pose estimation algorithms fail on smooth or self-similar scenes, while methods performing inverse rendering from unposed views require a rough initialization of the camera orientations. The main difficulty of pose estimation lies in real-life objects being almost invariant under certain transformations, making the photometric distance between rendered views non-convex with respect to the camera parameters. Using an equivalence relation that matches the distribution of local minima in camera space, we reduce this space to its quotient set, in which pose estimation becomes a more convex problem. Using a neural-network to regularize pose estimation, we demonstrate that our method - MELON - can reconstruct a neural radiance field from unposed images with state-of-the-art accuracy while requiring ten times fewer views than adversarial approaches.
翻译:摘要:神经辐射场能够从少量图像中实现具备照片级真实感的新视角合成与场景重建,但需要已知且精确的相机位姿。传统位姿估计算法在平滑或自相似场景中会失效,而基于无位姿视图进行逆渲染的方法则需要相机姿态的粗略初始化。位姿估计的主要困难在于真实物体在特定变换下几乎保持不变,这使得渲染视图间的光度距离相对于相机参数呈现非凸性。利用匹配相机空间中局部最小值分布的等价关系,我们将该空间简化为商集,在此空间中位姿估计成为更易解的凸问题。通过使用神经网络正则化位姿估计,我们证明所提出的MELON方法能够从无位姿图像中重建神经辐射场,其精度达到当前最优水平,且所需视图数量仅为对抗性方法所需量的十分之一。