A comprehensive understanding of 3D scenes is crucial in autonomous vehicles (AVs), and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. This paper introduces OccFusion, a novel sensor fusion framework for predicting 3D occupancy. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes and semanticKITTI dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at https://github.com/DanielMing123/OccFusion.
翻译:对三维场景的全面理解对自动驾驶汽车(AVs)至关重要,当前用于三维语义占据预测的模型已成功解决了描述具有不同形状和类别的真实世界物体的挑战。然而,现有三维占据预测方法高度依赖环视摄像头图像,易受光照和天气变化影响。本文提出OccFusion——一种新颖的传感器融合框架,用于预测三维占据。通过整合激光雷达和环视雷达等附加传感器的特征,我们的框架提升了占据预测的准确性与鲁棒性,在nuScenes基准上实现了顶级性能。此外,在nuScenes与SemanticKITTI数据集上进行的大量实验(包括具有挑战性的夜间与雨天场景)证实了我们的传感器融合策略在不同感知范围下的优越性能。该框架的代码将在https://github.com/DanielMing123/OccFusion开源。