Learning accurate scene reconstruction without pose priors in neural radiance fields is challenging due to inherent geometric ambiguity. Recent development either relies on correspondence priors for regularization or uses off-the-shelf flow estimators to derive analytical poses. However, the potential for jointly learning scene geometry, camera poses, and dense flow within a unified neural representation remains largely unexplored. In this paper, we present Flow-NeRF, a unified framework that simultaneously optimizes scene geometry, camera poses, and dense optical flow all on-the-fly. To enable the learning of dense flow within the neural radiance field, we design and build a bijective mapping for flow estimation, conditioned on pose. To make the scene reconstruction benefit from the flow estimation, we develop an effective feature enhancement mechanism to pass canonical space features to world space representations, significantly enhancing scene geometry. We validate our model across four important tasks, i.e., novel view synthesis, depth estimation, camera pose prediction, and dense optical flow estimation, using several datasets. Our approach surpasses previous methods in almost all metrics for novel-view view synthesis and depth estimation and yields both qualitatively sound and quantitatively accurate novel-view flow. Our project page is https://zhengxunzhi.github.io/flownerf/.
翻译:在神经辐射场中,无需位姿先验学习精确的场景重建具有挑战性,这源于固有的几何模糊性。现有方法要么依赖对应性先验进行正则化,要么使用现成的光流估计器来推导解析位姿。然而,在统一的神经表示中联合学习场景几何、相机位姿和稠密光流的潜力在很大程度上仍未得到探索。本文提出Flow-NeRF,一个统一框架,可同时在线优化场景几何、相机位姿和稠密光流。为实现神经辐射场内的稠密光流学习,我们设计并构建了一个以位姿为条件的、用于光流估计的双射映射。为使场景重建受益于光流估计,我们开发了一种有效的特征增强机制,将规范空间特征传递到世界空间表示中,从而显著增强场景几何。我们在多个数据集上,通过四个重要任务验证了我们的模型,即新视角合成、深度估计、相机位姿预测和稠密光流估计。我们的方法在新视角合成和深度估计的几乎所有指标上都超越了先前方法,并生成了在质量上可靠且在数量上准确的新视角光流。我们的项目页面是 https://zhengxunzhi.github.io/flownerf/。