Vision-centric 3D environment understanding is both vital and challenging for autonomous driving systems. Recently, object-free methods have attracted considerable attention. Such methods perceive the world by predicting the semantics of discrete voxel grids but fail to construct continuous and accurate obstacle surfaces. To this end, in this paper, we propose SurroundSDF to implicitly predict the signed distance field (SDF) and semantic field for the continuous perception from surround images. Specifically, we introduce a query-based approach and utilize SDF constrained by the Eikonal formulation to accurately describe the surfaces of obstacles. Furthermore, considering the absence of precise SDF ground truth, we propose a novel weakly supervised paradigm for SDF, referred to as the Sandwich Eikonal formulation, which emphasizes applying correct and dense constraints on both sides of the surface, thereby enhancing the perceptual accuracy of the surface. Experiments suggest that our method achieves SOTA for both occupancy prediction and 3D scene reconstruction tasks on the nuScenes dataset.
翻译:视觉为中心的三维环境理解对于自动驾驶系统而言至关重要且充满挑战。近年来,无目标方法受到了广泛关注。此类方法通过预测离散体素网格的语义来感知世界,但无法构建连续且精确的障碍物表面。为此,本文提出SurroundSDF,通过环绕图像隐式预测符号距离场(SDF)和语义场,实现连续感知。具体而言,我们引入基于查询的方法,并利用Eikonal公式约束的SDF精确描述障碍物表面。此外,针对缺乏精确SDF真值的问题,我们提出一种新的弱监督SDF范式,称为Sandwich Eikonal公式,该范式强调在表面两侧施加正确且密集的约束,从而提升表面感知精度。实验表明,我们的方法在nuScenes数据集上的占据预测和三维场景重建任务中均达到最优性能。