3D semantic occupancy prediction networks have demonstrated remarkable capabilities in reconstructing the geometric and semantic structure of 3D scenes, providing crucial information for robot navigation and autonomous driving systems. However, due to their large overhead from dense network structure designs, existing networks face challenges balancing accuracy and latency. In this paper, we introduce OccRWKV, an efficient semantic occupancy network inspired by Receptance Weighted Key Value (RWKV). OccRWKV separates semantics, occupancy prediction, and feature fusion into distinct branches, each incorporating Sem-RWKV and Geo-RWKV blocks. These blocks are designed to capture long-range dependencies, enabling the network to learn domain-specific representation (i.e., semantics and geometry), which enhances prediction accuracy. Leveraging the sparse nature of real-world 3D occupancy, we reduce computational overhead by projecting features into the bird's-eye view (BEV) space and propose a BEV-RWKV block for efficient feature enhancement and fusion. This enables real-time inference at 22.2 FPS without compromising performance. Experiments demonstrate that OccRWKV outperforms the state-of-the-art methods on the SemanticKITTI dataset, achieving a mIoU of 25.1 while being 20 times faster than the best baseline, Co-Occ, making it suitable for real-time deployment on robots to enhance autonomous navigation efficiency. Code and video are available on our project page: https://jmwang0117.github.io/OccRWKV/.
翻译:三维语义占据预测网络在重建三维场景的几何与语义结构方面展现出卓越能力,为机器人导航与自动驾驶系统提供了关键信息。然而,由于密集网络结构设计带来的巨大开销,现有网络在精度与延迟的平衡上面临挑战。本文提出OccRWKV,一种受Receptance Weighted Key Value (RWKV)启发的高效语义占据网络。OccRWKV将语义、占据预测与特征融合分离至独立分支,每个分支均包含Sem-RWKV与Geo-RWKV模块。这些模块专为捕获长程依赖而设计,使网络能够学习领域特定表示(即语义与几何),从而提升预测精度。利用真实世界三维占据的稀疏特性,我们通过将特征投影至鸟瞰图空间降低计算开销,并提出BEV-RWKV模块以实现高效特征增强与融合。这使得网络能以22.2 FPS的速度进行实时推理且不损失性能。实验表明,OccRWKV在SemanticKITTI数据集上优于现有最优方法,达到25.1的mIoU,同时比最佳基线Co-Occ快20倍,适合在机器人上实时部署以提升自主导航效率。代码与视频详见项目页面:https://jmwang0117.github.io/OccRWKV/。