3D semantic occupancy prediction is an essential part of autonomous driving, focusing on capturing the geometric details of scenes. Off-road environments are rich in geometric information, therefore it is suitable for 3D semantic occupancy prediction tasks to reconstruct such scenes. However, most of researches concentrate on on-road environments, and few methods are designed for off-road 3D semantic occupancy prediction due to the lack of relevant datasets and benchmarks. In response to this gap, we introduce WildOcc, to our knowledge, the first benchmark to provide dense occupancy annotations for off-road 3D semantic occupancy prediction tasks. A ground truth generation pipeline is proposed in this paper, which employs a coarse-to-fine reconstruction to achieve a more realistic result. Moreover, we introduce a multi-modal 3D semantic occupancy prediction framework, which fuses spatio-temporal information from multi-frame images and point clouds at voxel level. In addition, a cross-modality distillation function is introduced, which transfers geometric knowledge from point clouds to image features.
翻译:三维语义占据预测是自动驾驶的重要组成部分,其核心在于捕捉场景的几何细节。越野环境蕴含丰富的几何信息,因此适合通过三维语义占据预测任务来重建此类场景。然而,现有研究大多集中于道路环境,由于缺乏相关数据集与基准,鲜有方法专门针对越野三维语义占据预测而设计。为填补这一空白,我们提出了WildOcc,据我们所知,这是首个为越野三维语义占据预测任务提供稠密占据标注的基准。本文提出了一种真值生成流程,采用由粗到精的重建策略以获得更真实的结果。此外,我们引入了一种多模态三维语义占据预测框架,该框架在体素级别融合了来自多帧图像与点云的时空信息。同时,我们还提出了一种跨模态蒸馏函数,将点云中的几何知识迁移至图像特征中。