As autonomous driving moves toward full scene understanding, 3D semantic occupancy prediction has emerged as a crucial perception task, offering voxel-level semantics beyond traditional detection and segmentation paradigms. However, such a refined representation for scene understanding incurs prohibitive computation and memory overhead, posing a major barrier to practical real-time deployment. To address this, we propose SUG-Occ, an explicit Semantics and Uncertainty Guided Sparse Learning Enabled 3D Occupancy Prediction Framework, which exploits the inherent sparsity of 3D scenes to reduce redundant computation while maintaining geometric and semantic completeness. Specifically, we first utilize semantic and uncertainty priors to suppress projections from free space during view transformation while employing an explicit unsigned distance encoding to enhance geometric consistency, producing a structurally consistent sparse 3D representation. Secondly, we design an cascade sparse completion module via hyper cross sparse convolution and generative upsampling to enable efficiently coarse-to-fine reasoning. Finally, we devise an object contextual representation (OCR) based mask decoder that aggregates global semantic context from sparse features and refines voxel-wise predictions via lightweight query-context interactions, avoiding expensive attention operations over volumetric features. Extensive experiments on SemanticKITTI benchmark demonstrate that the proposed approach outperforms the baselines, achieving a 7.34/% improvement in accuracy and a 57.8\% gain in efficiency.
翻译:随着自动驾驶向全场景理解发展,3D语义占据预测已成为一项关键的感知任务,它提供了超越传统检测与分割范式的体素级语义。然而,这种用于场景理解的精细化表示带来了极高的计算与内存开销,构成了实际实时部署的主要障碍。为解决此问题,我们提出了SUG-Occ,一种显式语义与不确定性引导的稀疏学习3D占据预测框架。该框架利用3D场景固有的稀疏性来减少冗余计算,同时保持几何与语义的完整性。具体而言,我们首先利用语义与不确定性先验,在视图变换过程中抑制来自自由空间的投影,同时采用显式的无符号距离编码来增强几何一致性,从而生成结构一致的稀疏3D表示。其次,我们通过超交叉稀疏卷积与生成式上采样设计了一个级联稀疏补全模块,以实现高效的由粗到细推理。最后,我们设计了一个基于对象上下文表示(OCR)的掩码解码器,该解码器聚合来自稀疏特征的全局语义上下文,并通过轻量级的查询-上下文交互细化体素级预测,从而避免了对体素特征进行昂贵的注意力操作。在SemanticKITTI基准上进行的大量实验表明,所提方法优于基线模型,在精度上提升了7.34%,在效率上提升了57.8%。