Training deep learning models for semantic occupancy prediction is challenging due to factors such as a large number of occupancy cells, severe occlusion, limited visual cues, complicated driving scenarios, etc. Recent methods often adopt transformer-based architectures given their strong capability in learning input-conditioned weights and long-range relationships. However, transformer-based networks are notorious for their quadratic computation complexity, seriously undermining their efficacy and deployment in semantic occupancy prediction. Inspired by the global modeling and linear computation complexity of the Mamba architecture, we present the first Mamba-based network for semantic occupancy prediction, termed OccMamba. However, directly applying the Mamba architecture to the occupancy prediction task yields unsatisfactory performance due to the inherent domain gap between the linguistic and 3D domains. To relieve this problem, we present a simple yet effective 3D-to-1D reordering operation, i.e., height-prioritized 2D Hilbert expansion. It can maximally retain the spatial structure of point clouds as well as facilitate the processing of Mamba blocks. Our OccMamba achieves state-of-the-art performance on three prevalent occupancy prediction benchmarks, including OpenOccupancy, SemanticKITTI and SemanticPOSS. Notably, on OpenOccupancy, our OccMamba outperforms the previous state-of-the-art Co-Occ by 3.1% IoU and 3.2% mIoU, respectively. Codes will be released upon publication.
翻译:由于占据栅格单元数量庞大、遮挡严重、视觉线索有限、驾驶场景复杂等因素,训练用于语义占据栅格预测的深度学习模型具有挑战性。鉴于Transformer架构在学习输入条件权重和长程关系方面的强大能力,现有方法多采用基于Transformer的架构。然而,基于Transformer的网络因其二次计算复杂度而备受诟病,这严重影响了其在语义占据栅格预测中的效能和部署。受Mamba架构全局建模能力和线性计算复杂度的启发,我们提出了首个基于Mamba的语义占据栅格预测网络,命名为OccMamba。然而,由于语言域与3D域之间存在固有的领域差异,直接将Mamba架构应用于占据栅格预测任务会导致性能不佳。为缓解此问题,我们提出了一种简单而有效的3D到1D重排序操作,即高度优先的2D希尔伯特扩展。该操作能最大程度地保留点云的空间结构,并促进Mamba模块的处理。我们的OccMamba在三个主流占据栅格预测基准测试(包括OpenOccupancy、SemanticKITTI和SemanticPOSS)上取得了最先进的性能。值得注意的是,在OpenOccupancy数据集上,我们的OccMamba分别以3.1% IoU和3.2% mIoU的优势超越了先前最先进的Co-Occ方法。代码将在论文发表后开源。