Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an opening problem. Noting that the automotive industry has a huge amount of image data, crowd-sourcing is a convenient way for large-scale data collection. In this paper, we present a crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model. This approach solves the key problem of large-scale reconstruction, that is where the data comes from and how to use them. Firstly, the crowd-sourced massive data is filtered to remove redundancy and keep a balanced distribution in terms of time and space. Then a structure-from-motion module is performed to refine camera poses. Finally, images, as well as poses, are used to train the NeRF model in a certain block. We highlight that we present a comprehensive framework that integrates multiple modules, including data selection, sparse 3D reconstruction, sequence appearance embedding, depth supervision of ground surface, and occlusion completion. The complete system is capable of effectively processing and reconstructing high-quality 3D scenes from crowd-sourced data. Extensive quantitative and qualitative experiments were conducted to validate the performance of our system. Moreover, we proposed an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video.

翻译：近年来，神经辐射场（NeRF）在新视角合成方面取得了显著成果。Block-NeRF展示了利用NeRF构建大规模城市级模型的能力。对于大规模建模，海量图像数据是必需的。通过专门设计的数据采集车辆收集图像无法支撑大规模应用。如何获取大规模高质量数据仍是一个开放性问题。注意到汽车行业拥有海量图像数据，众包成为大规模数据采集的便捷途径。本文提出一种众包框架，利用量产车辆捕获的大量数据，通过NeRF模型重建场景。该方法解决了大规模重建的关键问题，即数据来源及其利用方式。首先，对众包海量数据进行筛选以去除冗余，并保持时空分布的均衡性。随后通过运动恢复结构模块优化相机位姿。最后，将图像与位姿共同用于特定区块的NeRF模型训练。我们强调，本框架整合了多个模块，包括数据选择、稀疏三维重建、序列外观嵌入、地面深度监督以及遮挡补全。该完整系统能够有效处理众包数据并重建高质量三维场景。我们进行了大量定量与定性实验以验证系统性能。此外，我们提出名为"第一视角导航"的应用，利用NeRF模型生成三维街景，并通过合成视频为驾驶员提供引导。