Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct urban outdoor scenes due to their large, unbounded, and highly detailed nature. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle such issues, we present SCILLA, a new hybrid implicit surface learning method to reconstruct large driving scenes from 2D images. SCILLA's hybrid architecture models two separate implicit fields: one for the volumetric density and another for the signed distance to the surface. To accurately represent urban outdoor scenarios, we introduce a novel volume-rendering strategy that relies on self-supervised probabilistic density estimation to sample points near the surface and transition progressively from volumetric to surface representation. Our solution permits a proper and fast initialization of the signed distance field without relying on any geometric prior on the scene, compared to concurrent methods. By conducting extensive experiments on four outdoor driving datasets, we show that SCILLA can learn an accurate and detailed 3D surface scene representation in various urban scenarios while being two times faster to train compared to previous state-of-the-art solutions.
翻译:摘要:近年来,神经隐式表面表示方法在三维重建任务中取得了令人瞩目的成果。然而,现有方法因城市场景规模庞大、无界且高度精细的特性,难以准确重建此类户外场景。为获得精确重建结果,现有方案需要依赖激光雷达(LiDAR)等额外监督数据、强几何先验以及较长的训练时间。针对上述问题,本文提出SCILLA——一种新型混合隐式表面学习方法,可从二维图像中重建大型驾驶场景。SCILLA的混合架构分别建模两个独立的隐式场:其一体积密度场,其二表面符号距离场。为精确表征户外城市场景,我们引入一种新型体积渲染策略,该策略通过自监督概率密度估计在表面附近采样点,并逐步从体积表示过渡至表面表示。与同期方法相比,我们的方案无需依赖场景的几何先验即可合理且快速地初始化符号距离场。通过在四个户外驾驶数据集上开展的大量实验表明,SCILLA能够在多种城市场景中学习精确且细致的三维表面场景表示,同时训练速度较先前最先进方案提升两倍。