3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving, which predicts fine-grained geometry and semantics of the surrounding scene. Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes. Although 3D semantic Gaussian serves as an object-centric sparse alternative, most of the Gaussians still describe the empty region with low efficiency. To address this, we propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied and conforms to probabilistic multiplication to derive the overall geometry. Furthermore, we adopt the exact Gaussian mixture model for semantics calculation to avoid unnecessary overlapping of Gaussians. To effectively initialize Gaussians in non-empty region, we design a distribution-based initialization module which learns the pixel-aligned occupancy distribution instead of the depth of surfaces. We conduct extensive experiments on nuScenes and KITTI-360 datasets and our GaussianFormer-2 achieves state-of-the-art performance with high efficiency. Code: https://github.com/huang-yh/GaussianFormer.
翻译:三维语义占据预测是视觉中心自动驾驶鲁棒性的重要任务,旨在预测周围场景的细粒度几何与语义信息。现有方法大多采用基于密集网格的场景表示,忽略了驾驶场景的空间稀疏性。尽管三维语义高斯作为一种以物体为中心的稀疏替代方案,但多数高斯仍以低效方式描述空区域。为解决此问题,我们提出一种概率高斯叠加模型,将每个高斯分布解释为其邻域被占据的概率分布,并通过概率乘法运算推导整体几何结构。此外,我们采用精确的高斯混合模型进行语义计算,以避免不必要的高斯重叠。为在非空区域有效初始化高斯分布,我们设计了基于分布的初始化模块,该模块学习像素对齐的占据分布而非表面深度。我们在 nuScenes 和 KITTI-360 数据集上进行了广泛实验,结果表明 GaussianFormer-2 以高效计算实现了最先进的性能。代码:https://github.com/huang-yh/GaussianFormer。