3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving, which predicts fine-grained geometry and semantics of the surrounding scene. Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes. Although 3D semantic Gaussian serves as an object-centric sparse alternative, most of the Gaussians still describe the empty region with low efficiency. To address this, we propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied and conforms to probabilistic multiplication to derive the overall geometry. Furthermore, we adopt the exact Gaussian mixture model for semantics calculation to avoid unnecessary overlapping of Gaussians. To effectively initialize Gaussians in non-empty region, we design a distribution-based initialization module which learns the pixel-aligned occupancy distribution instead of the depth of surfaces. We conduct extensive experiments on nuScenes and KITTI-360 datasets and our GaussianFormer-2 achieves state-of-the-art performance with high efficiency. Code: https://github.com/huang-yh/GaussianFormer.
翻译:三维语义占据预测是视觉中心自动驾驶中实现鲁棒性的重要任务,其预测周围场景的细粒度几何与语义。现有方法大多采用基于密集网格的场景表示,忽视了驾驶场景的空间稀疏性。尽管三维语义高斯可作为以物体为中心的稀疏替代方案,但多数高斯仍以低效率描述空区域。为解决此问题,我们提出一种概率高斯叠加模型,将每个高斯解释为其邻域被占据的概率分布,并遵循概率乘法规则推导整体几何结构。此外,我们采用精确高斯混合模型进行语义计算,以避免不必要的高斯重叠。为在非空区域有效初始化高斯,我们设计了基于分布的初始化模块,该模块学习像素对齐的占据分布而非表面深度。我们在nuScenes和KITTI-360数据集上进行了大量实验,所提出的GaussianFormer-2以高效率实现了最先进的性能。代码:https://github.com/huang-yh/GaussianFormer。