3D occupancy prediction based on multi-sensor fusion,crucial for a reliable autonomous driving system, enables fine-grained understanding of 3D scenes. Previous fusion-based 3D occupancy predictions relied on depth estimation for processing 2D image features. However, depth estimation is an ill-posed problem, hindering the accuracy and robustness of these methods. Furthermore, fine-grained occupancy prediction demands extensive computational resources. To address these issues, we propose OccFusion, a depth estimation free multi-modal fusion framework. Additionally, we introduce a generalizable active training method and an active decoder that can be applied to any occupancy prediction model, with the potential to enhance their performance. Experiments conducted on nuScenes-Occupancy and nuScenes-Occ3D demonstrate our framework's superior performance. Detailed ablation studies highlight the effectiveness of each proposed method.
翻译:基于多传感器融合的三维占据预测对于构建可靠的自动驾驶系统至关重要,它能够实现对三维场景的细粒度理解。以往基于融合的三维占据预测方法依赖深度估计来处理二维图像特征。然而,深度估计是一个不适定问题,这限制了这些方法的精度与鲁棒性。此外,细粒度的占据预测需要大量的计算资源。为解决这些问题,我们提出了OccFusion,一个无需深度估计的多模态融合框架。同时,我们提出了一种通用的主动训练方法以及一个可应用于任何占据预测模型的主动解码器,它们具有提升模型性能的潜力。在nuScenes-Occupancy和nuScenes-Occ3D数据集上进行的实验证明了我们框架的优越性能。详细的消融研究凸显了所提各项方法的有效性。