Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct urban outdoor scenes due to their large, unbounded, and highly detailed nature. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle such issues, we present SCILLA, a new hybrid implicit surface learning method to reconstruct large driving scenes from 2D images. SCILLA's hybrid architecture models two separate implicit fields: one for the volumetric density and another for the signed distance to the surface. To accurately represent urban outdoor scenarios, we introduce a novel volume-rendering strategy that relies on self-supervised probabilistic density estimation to sample points near the surface and transition progressively from volumetric to surface representation. Our solution permits a proper and fast initialization of the signed distance field without relying on any geometric prior on the scene, compared to concurrent methods. By conducting extensive experiments on four outdoor driving datasets, we show that SCILLA can learn an accurate and detailed 3D surface scene representation in various urban scenarios while being two times faster to train compared to previous state-of-the-art solutions.
翻译:摘要:神经隐式表面表示方法近期在三维重建任务中展现出令人瞩目的成果。然而,现有方法在重建城市户外场景时仍面临挑战,主要源于此类场景的大尺度、无边界及高细节特性。因此,为获得精确重建结果,通常需要依赖激光雷达等额外监督数据、强几何先验及较长训练时间。针对上述问题,我们提出SCILLA——一种新型混合隐式表面学习方法,可从二维图像重建大型驾驶场景。SCILLA的混合架构分别建模两个独立隐式场:一个用于体积密度,另一个用于曲面符号距离。为精确表征城市户外场景,我们引入一种新颖的体积渲染策略,该策略基于自监督概率密度估计,在曲面附近采样点并逐步从体积表示过渡至表面表示。与现有方法相比,本方案无需依赖任何场景几何先验即可实现符号距离场的快速合理初始化。通过在四个户外驾驶数据集上的大量实验表明,SCILLA能够在多种城市场景中学习精确且细节丰富的三维表面场景表示,其训练速度较先前最先进方案提升两倍。