We propose a generalizable neural radiance fields - MonoNeRF, that can be trained on large-scale monocular videos of moving in static scenes without any ground-truth annotations of depth and camera poses. MonoNeRF follows an Autoencoder-based architecture, where the encoder estimates the monocular depth and the camera pose, and the decoder constructs a Multiplane NeRF representation based on the depth encoder feature, and renders the input frames with the estimated camera. The learning is supervised by the reconstruction error. Once the model is learned, it can be applied to multiple applications including depth estimation, camera pose estimation, and single-image novel view synthesis. More qualitative results are available at: https://oasisyang.github.io/mononerf .
翻译:我们提出了一种泛化性神经辐射场——MonoNeRF,该模型可在不含深度和相机姿态真值标注的大规模静态场景单目视频上进行训练。MonoNeRF采用基于自编码器的架构,其中编码器估计单目深度和相机姿态,解码器则基于深度编码器特征构建多平面NeRF表示,并利用估计的相机姿态对输入帧进行渲染。模型通过重建误差进行监督学习。训练完成后,该模型可应用于深度估计、相机姿态估计以及单图像新视角合成等多种任务。更多定性结果请访问:https://oasisyang.github.io/mononerf 。