The generation of high-fidelity view synthesis is essential for robotic navigation and interaction but remains challenging, particularly in indoor environments and real-time scenarios. Existing techniques often require significant computational resources for both training and rendering, and they frequently result in suboptimal 3D representations due to insufficient geometric structuring. To address these limitations, we introduce VoxNeRF, a novel approach that utilizes easy-to-obtain geometry priors to enhance both the quality and efficiency of neural indoor reconstruction and novel view synthesis. We propose an efficient voxel-guided sampling technique that allocates computational resources selectively to the most relevant segments of rays based on a voxel-encoded geometry prior, significantly reducing training and rendering time. Additionally, we incorporate a robust depth loss to improve reconstruction and rendering quality in sparse view settings. Our approach is validated with extensive experiments on ScanNet and ScanNet++ where VoxNeRF outperforms existing state-of-the-art methods and establishes a new benchmark for indoor immersive interpolation and extrapolation settings.
翻译:高保真视图合成对于机器人导航与交互至关重要,但在室内环境和实时场景中仍面临挑战。现有技术通常需要大量计算资源进行训练与渲染,且常因几何结构信息不足而导致次优的三维表示。为克服这些局限,本文提出VoxNeRF——一种利用易获取几何先验来提升神经室内重建与新视图合成的质量与效率的创新方法。我们设计了一种高效的体素引导采样技术,该技术基于体素编码的几何先验,将计算资源选择性地分配给光线中最相关的区段,从而显著减少训练与渲染时间。此外,我们引入鲁棒的深度损失函数,以提升稀疏视角设置下的重建与渲染质量。通过在ScanNet和ScanNet++数据集上的大量实验验证,VoxNeRF在性能上超越了现有先进方法,并为室内沉浸式插值与外推场景建立了新的基准。