Semantic occupancy perception is essential for autonomous driving, as automated vehicles require a fine-grained perception of the 3D urban structures. However, existing relevant benchmarks lack diversity in urban scenes, and they only evaluate front-view predictions. Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark. In the OpenOccupancy benchmark, we extend the large-scale nuScenes dataset with dense semantic occupancy annotations. Previous annotations rely on LiDAR points superimposition, where some occupancy labels are missed due to sparse LiDAR channels. To mitigate the problem, we introduce the Augmenting And Purifying (AAP) pipeline to ~2x densify the annotations, where ~4000 human hours are involved in the labeling process. Besides, camera-based, LiDAR-based and multi-modal baselines are established for the OpenOccupancy benchmark. Furthermore, considering the complexity of surrounding occupancy perception lies in the computational burden of high-resolution 3D predictions, we propose the Cascade Occupancy Network (CONet) to refine the coarse prediction, which relatively enhances the performance by ~30% than the baseline. We hope the OpenOccupancy benchmark will boost the development of surrounding occupancy perception algorithms.
翻译:语义占据感知对于自动驾驶至关重要,因为自动驾驶车辆需要对三维城市结构进行精细感知。然而,现有相关基准在城市场景多样性方面存在不足,且仅评估前向视角的预测结果。为实现对周围感知算法的全面评估,我们提出OpenOccupancy——首个面向周围语义占据感知的基准数据集。在该基准中,我们为大规模nuScenes数据集扩展了密集语义占据标注。此前标注依赖激光雷达点云叠加,由于激光雷达通道稀疏,部分占据标签存在缺失问题。为缓解该问题,我们引入增强与纯化 (AAP) 流程,将标注密度提升约2倍,该标注过程累计耗时约4000人工小时。此外,我们为OpenOccupancy基准建立了基于摄像头、基于激光雷达及多模态的基线方法。针对周围占据感知的复杂性源于高分辨率三维预测的计算负担问题,我们提出级联占据网络 (CONet) 来优化粗粒度预测,该网络相较于基线方法性能提升约30%。期望OpenOccupancy基准能推动周围占据感知算法的发展。