Vision-centric semantic occupancy prediction plays a crucial role in autonomous driving, which requires accurate and reliable predictions from low-cost sensors. Although having notably narrowed the accuracy gap with LiDAR, there is still few research effort to explore the reliability in predicting semantic occupancy from camera. In this paper, we conduct a comprehensive evaluation of existing semantic occupancy prediction models from a reliability perspective for the first time. Despite the gradual alignment of camera-based models with LiDAR in term of accuracy, a significant reliability gap persists. To addresses this concern, we propose ReliOcc, a method designed to enhance the reliability of camera-based occupancy networks. ReliOcc provides a plug-and-play scheme for existing models, which integrates hybrid uncertainty from individual voxels with sampling-based noise and relative voxels through mix-up learning. Besides, an uncertainty-aware calibration strategy is devised to further enhance model reliability in offline mode. Extensive experiments under various settings demonstrate that ReliOcc significantly enhances model reliability while maintaining the accuracy of both geometric and semantic predictions. Importantly, our proposed approach exhibits robustness to sensor failures and out of domain noises during inference.
翻译:视觉为中心的语义占据预测在自动驾驶中发挥着关键作用,这要求从低成本传感器获得准确且可靠的预测。尽管基于相机的模型在精度上与激光雷达的差距已显著缩小,但探索从相机预测语义占据的可靠性的研究仍然很少。本文首次从可靠性角度对现有语义占据预测模型进行了全面评估。尽管基于相机的模型在精度上正逐步与激光雷达对齐,但在可靠性方面仍存在显著差距。为解决这一问题,我们提出了ReliOcc,一种旨在增强基于相机的占据网络可靠性的方法。ReliOcc为现有模型提供了一种即插即用方案,它通过混合学习,将来自基于采样的噪声的个体体素不确定性,与相关体素的不确定性相融合。此外,还设计了一种不确定性感知的校准策略,以在离线模式下进一步增强模型的可靠性。在各种设置下的大量实验表明,ReliOcc在保持几何和语义预测精度的同时,显著增强了模型的可靠性。重要的是,我们提出的方法在推理过程中对传感器故障和域外噪声表现出鲁棒性。