Multi-sensor fusion significantly enhances the accuracy and robustness of 3D semantic occupancy prediction, which is crucial for autonomous driving and robotics. However, most existing approaches depend on large image resolutions and complex networks to achieve top performance, hindering their application in practical scenarios. Additionally, most multi-sensor fusion approaches focus on improving fusion features while overlooking the exploration of supervision strategies for these features. To this end, we propose DAOcc, a novel multi-modal occupancy prediction framework that leverages 3D object detection supervision to assist in achieving superior performance, while using a deployment-friendly image feature extraction network and practical input image resolution. Furthermore, we introduce a BEV View Range Extension strategy to mitigate the adverse effects of reduced image resolution. Experimental results show that DAOcc achieves new state-of-the-art performance on the Occ3D-nuScenes and SurroundOcc benchmarks, and surpasses other methods by a significant margin while using only ResNet50 and 256*704 input image resolution. Code will be made available at https://github.com/AlphaPlusTT/DAOcc.
翻译:多传感器融合显著提升了三维语义占据预测的精度与鲁棒性,这对自动驾驶和机器人技术至关重要。然而,现有方法大多依赖高图像分辨率和复杂网络以实现最佳性能,这阻碍了其在实际场景中的应用。此外,多数多传感器融合方法侧重于改进融合特征,却忽视了对这些特征监督策略的探索。为此,我们提出DAOcc,一种新颖的多模态占据预测框架。该框架利用三维物体检测监督来辅助实现优越性能,同时采用部署友好的图像特征提取网络和实用的输入图像分辨率。此外,我们引入了BEV视图范围扩展策略,以缓解图像分辨率降低带来的不利影响。实验结果表明,DAOcc在Occ3D-nuScenes和SurroundOcc基准测试中取得了新的最先进性能,并且在使用仅ResNet50和256*704输入图像分辨率的情况下,以显著优势超越了其他方法。代码将在https://github.com/AlphaPlusTT/DAOcc 公开。