Dense 3D reconstruction has many applications in automated driving including automated annotation validation, multimodal data augmentation, providing ground truth annotations for systems lacking LiDAR, as well as enhancing auto-labeling accuracy. LiDAR provides highly accurate but sparse depth, whereas camera images enable estimation of dense depth but noisy particularly at long ranges. In this paper, we harness the strengths of both sensors and propose a multimodal 3D scene reconstruction using a framework combining neural implicit surfaces and radiance fields. In particular, our method estimates dense and accurate 3D structures and creates an implicit map representation based on signed distance fields, which can be further rendered into RGB images, and depth maps. A mesh can be extracted from the learned signed distance field and culled based on occlusion. Dynamic objects are efficiently filtered on the fly during sampling using 3D object detection models. We demonstrate qualitative and quantitative results on challenging automotive scenes.
翻译:密集三维重建在自动驾驶领域具有广泛应用,包括自动标注验证、多模态数据增强、为缺乏激光雷达的系统提供真实标注,以及提升自动标注精度。激光雷达能提供高精度但稀疏的深度信息,而摄像头图像可估计密集深度,但在远距离范围内存在较大噪声。本文结合两种传感器的优势,提出一种结合神经隐式曲面与辐射场的多模态三维场景重建框架。具体而言,该方法可估计密集且精确的三维结构,并基于有符号距离场构建隐式地图表示,进而可渲染为RGB图像与深度图。从学习到的有符号距离场中可提取网格,并基于遮挡进行裁切。在采样过程中,利用三维目标检测模型高效滤除动态目标。我们针对具有挑战性的汽车场景展示了定性与定量结果。