We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address this challenge by formulating, to the best of our knowledge, the first differentiable end-to-end LiDAR rendering framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points. However, simply employing NeRF cannot achieve satisfactory results, as it only focuses on learning individual pixels while ignoring local information, especially at low texture areas, resulting in poor geometry. To this end, we have taken steps to address this issue by introducing a structural regularization method to preserve local structural details. To evaluate the effectiveness of our approach, we establish an object-centric multi-view LiDAR dataset, dubbed NeRF-MVL. It contains observations of objects from 9 categories seen from 360-degree viewpoints captured with multiple LiDAR sensors. Our extensive experiments on the scene-level KITTI-360 dataset, and on our object-level NeRF-MVL show that our LiDAR-NeRF surpasses the model-based algorithms significantly.
翻译:摘要:本文提出一项新任务——激光雷达传感器的新型视角合成。尽管基于传统模型并采用风格迁移神经网络的激光雷达模拟器能够用于渲染新视角,但由于其依赖显式三维重建并借助游戏引擎,忽略了激光雷达点的重要属性,因此无法生成精确且逼真的激光雷达模式。为解决这一挑战,我们据我们所知首次提出了可微分端到端的激光雷达渲染框架——LiDAR-NeRF,利用神经辐射场(NeRF)促进几何与三维点属性的联合学习。然而,直接采用NeRF无法获得令人满意的结果,因为它仅专注于学习单个像素而忽略局部信息,特别是在低纹理区域,从而导致几何质量不佳。为此,我们引入了一种结构正则化方法来保留局部结构细节。为评估方法的有效性,我们构建了一个以物体为中心的多视角激光雷达数据集,命名为NeRF-MVL。该数据集包含来自9个类别的物体在360度视角下的观测数据,由多个激光雷达传感器捕获。我们在场景级数据集KITTI-360以及物体级数据集NeRF-MVL上进行了大量实验,结果表明,我们的LiDAR-NeRF显著优于基于模型的算法。