We present a neural radiance field (NeRF) based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photorealistic texture. Our system adopts the state-of-the-art NeRF representation to incorporate lidar. Adding lidar data adds strong geometric constraints on the depth and surface normals, which is particularly useful when modelling uniform texture surfaces which contain ambiguous visual reconstruction cues. A key contribution of this work is a novel method to quantify the epistemic uncertainty of the lidar-visual NeRF reconstruction by estimating the spatial variance of each point location in the radiance field given the sensor observations from the cameras and lidar. This provides a principled approach to evaluate the contribution of each sensor modality to the final reconstruction. In this way, reconstructions that are uncertain (due to e.g. uniform visual texture, limited observation viewpoints, or little lidar coverage) can be identified and removed. Our system is integrated with a real-time lidar SLAM system which is used to bootstrap a Structure-from-Motion (SfM) reconstruction procedure. It also helps to properly constrain the overall metric scale which is essential for the lidar depth loss. The refined SLAM trajectory can then be divided into submaps using Spectral Clustering to group sets of co-visible images together. This submapping approach is more suitable for visual reconstruction than distance-based partitioning. Our uncertainty estimation is particularly effective when merging submaps as their boundaries often contain artefacts due to limited observations. We demonstrate the reconstruction system using a multi-camera, lidar sensor suite in experiments involving both robot-mounted and handheld scanning. Our test datasets cover a total area of more than 20,000 square metres.
翻译:本文提出了一种基于神经辐射场(NeRF)的大规模重建系统,该系统融合激光雷达与视觉数据,以生成几何精确且具有逼真纹理的高质量重建结果。我们的系统采用最先进的NeRF表示方法,并融入了激光雷达数据。添加激光雷达数据为深度与表面法线提供了强几何约束,这对于建模具有模糊视觉重建线索的均匀纹理表面尤为有用。本工作的一个关键贡献是提出了一种新颖的方法,用于量化激光雷达-视觉NeRF重建的认知不确定性:该方法通过给定相机与激光雷达的传感器观测数据,估计辐射场中每个点位置的空间方差。这为评估各传感器模态对最终重建的贡献提供了一种原则性方法。借此,可以识别并剔除那些不确定的重建(例如,由于均匀的视觉纹理、有限的观测视角或稀疏的激光雷达覆盖所导致)。我们的系统与实时激光雷达SLAM系统集成,该系统用于引导运动恢复结构(SfM)重建流程。它还有助于正确约束整体度量尺度,这对于激光雷达深度损失至关重要。随后,可以使用谱聚类将优化后的SLAM轨迹划分为子地图,将共可见的图像集分组在一起。这种子地图划分方法比基于距离的分区更适合视觉重建。我们的不确定性估计在合并子地图时特别有效,因为子地图边界常因观测有限而存在伪影。我们使用一套多相机与激光雷达传感器套件,在机器人搭载和手持扫描的实验场景中展示了该重建系统。我们的测试数据集覆盖总面积超过20,000平方米。