Recent advances in neural radiance fields (NeRFs) achieve state-of-the-art novel view synthesis and facilitate dense estimation of scene properties. However, NeRFs often fail for large, unbounded scenes that are captured under very sparse views with the scene content concentrated far away from the camera, as is typical for field robotics applications. In particular, NeRF-style algorithms perform poorly: (1) when there are insufficient views with little pose diversity, (2) when scenes contain saturation and shadows, and (3) when finely sampling large unbounded scenes with fine structures becomes computationally intensive. This paper proposes CLONeR, which significantly improves upon NeRF by allowing it to model large outdoor driving scenes that are observed from sparse input sensor views. This is achieved by decoupling occupancy and color learning within the NeRF framework into separate Multi-Layer Perceptrons (MLPs) trained using LiDAR and camera data, respectively. In addition, this paper proposes a novel method to build differentiable 3D Occupancy Grid Maps (OGM) alongside the NeRF model, and leverage this occupancy grid for improved sampling of points along a ray for volumetric rendering in metric space. Through extensive quantitative and qualitative experiments on scenes from the KITTI dataset, this paper demonstrates that the proposed method outperforms state-of-the-art NeRF models on both novel view synthesis and dense depth prediction tasks when trained on sparse input data.
翻译:近期神经辐射场(NeRF)的进展在实现新颖视图合成与稠密场景属性估计方面取得了最先进成果。然而,当处理野外机器人应用中常见的远距离稀疏视角大范围无界场景时,NeRF模型通常表现不佳。特别是NeRF类算法在以下场景存在显著缺陷:(1)视角不足且姿态多样性匮乏时;(2)场景包含饱和区域与阴影时;(3)对含精细结构的大范围无界场景进行密集采样导致计算开销剧增时。本文提出CLONeR方法,通过解耦NeRF框架中的占用场与颜色学习,分别使用激光雷达与相机数据训练独立的多层感知机,显著提升NeRF对稀疏输入传感器视角下户外驾驶场景的建模能力。此外,本文提出一种新型方法,在NeRF模型旁构建可微分三维占用网格地图,并利用该占用网格改进度量空间体素渲染中沿射线的点采样策略。基于KITTI数据集场景的定量与定性实验表明,在稀疏输入数据训练条件下,该方法在新颖视图合成与稠密深度预测任务中均优于现有最先进的NeRF模型。