3D reconstruction from 2D image was extensively studied, training with depth supervision. To relax the dependence to costly-acquired datasets, we propose SceneRF, a self-supervised monocular scene reconstruction method using only posed image sequences for training. Fueled by the recent progress in neural radiance fields (NeRF) we optimize a radiance field though with explicit depth optimization and a novel probabilistic sampling strategy to efficiently handle large scenes. At inference, a single input image suffices to hallucinate novel depth views which are fused together to obtain 3D scene reconstruction. Thorough experiments demonstrate that we outperform all recent baselines for novel depth views synthesis and scene reconstruction, on indoor BundleFusion and outdoor SemanticKITTI. Our code is available at https://astra-vision.github.io/SceneRF.
翻译:从单张2D图像进行3D重建已被广泛研究,该类方法通常依赖深度监督训练。为降低对昂贵标注数据集的依赖,我们提出SceneRF——一种仅利用带位姿图像序列进行训练的自监督单目场景重建方法。受神经辐射场(NeRF)近期进展的启发,我们通过显式深度优化与新型概率采样策略对辐射场进行优化,以高效处理大规模场景。在推理阶段,单张输入图像即可合成新颖视角深度图,将其融合后获得3D场景重建结果。大量实验表明,在室内数据集BundleFusion与室外数据集SemanticKITTI上,我们的方法在新颖深度图合成和场景重建任务中均优于所有近期基线模型。代码已开源:https://astra-vision.github.io/SceneRF。