This paper introduces OccFusion, a straightforward and efficient sensor fusion framework for predicting 3D occupancy. A comprehensive understanding of 3D scenes is crucial in autonomous driving, and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at https://github.com/DanielMing123/OCCFusion.
翻译:本文提出了OccFusion,一种简洁高效的三维占据预测传感器融合框架。三维场景的全面理解对于自动驾驶至关重要,现有三维语义占据预测模型已成功解决了描述不同形状与类别真实世界物体的挑战。然而,当前三维占据预测方法严重依赖环视相机图像,使其易受光照和天气条件变化的影响。通过融合激光雷达和环视雷达等额外传感器特征,本框架增强了占据预测的精度与鲁棒性,在nuScenes基准上取得了顶尖性能。此外,在nuScenes数据集上开展的广泛实验(包括具有挑战性的夜间和雨天场景)证实了本传感器融合策略在不同感知范围下的优越性能。本框架代码将开源发布于https://github.com/DanielMing123/OCCFusion。