We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data. As neural rendering research expands rapidly, its integration into street views has started to draw interests. Existing approaches on street views either mainly focus on novel view synthesis with little exploration of the scene geometry, or rely heavily on dense LiDAR data when investigating reconstruction. Neither of them investigates multi-view implicit surface reconstruction, especially under settings without LiDAR data. Our method extends prior object-centric neural surface reconstruction techniques to address the unique challenges posed by the unbounded street views that are captured with non-object-centric, long and narrow camera trajectories. We delimit the unbounded space into three parts, close-range, distant-view and sky, with aligned cuboid boundaries, and adapt cuboid/hyper-cuboid hash-grids along with road-surface initialization scheme for finer and disentangled representation. To further address the geometric errors arising from textureless regions and insufficient viewing angles, we adopt geometric priors that are estimated using general purpose monocular models. Coupled with our implementation of efficient and fine-grained multi-stage ray marching strategy, we achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time with a single RTX3090 GPU for each street view sequence. Furthermore, we demonstrate that the reconstructed implicit surfaces have rich potential for various downstream tasks, including ray tracing and LiDAR simulation.
翻译:我们提出了一种名为StreetSurf的新型多视角隐式表面重建技术,该技术可直接应用于广泛使用的自动驾驶数据集(如Waymo感知序列)中的街景图像,且无需依赖LiDAR数据。随着神经渲染研究的快速发展,其与街景场景的融合已开始引发关注。现有街景方法要么主要关注新视图合成而较少探索场景几何结构,要么在研究重建时严重依赖密集LiDAR数据,均未涉及多视角隐式表面重建,尤其在无LiDAR数据设定下的重建。我们的方法将先前以物体为中心的神经表面重建技术进行扩展,以应对非物体中心、狭长相机轨迹所捕获的无界街景带来的独特挑战。我们将无界空间划分为近景、远景和天空三个部分,并采用对齐的长方体边界,通过适配长方体/超长方体哈希网格及路面初始化方案实现更精细且解耦的表示。为进一步解决无纹理区域和视角不足导致的几何误差,我们采用通用单目模型估计的几何先验。结合高效细粒度多阶段光线行进策略的实现,我们仅需在单个RTX3090 GPU上对每段街景序列训练一两小时,即可在几何与外观重建质量上达到业界最佳水平。此外,我们证明了重建的隐式表面在光线追踪和LiDAR模拟等下游任务中具有丰富潜力。