In this technical report, we present our solution, named UniOCC, for the Vision-Centric 3D occupancy prediction track in the nuScenes Open Dataset Challenge at CVPR 2023. Existing methods for occupancy prediction primarily focus on optimizing projected features on 3D volume space using 3D occupancy labels. However, the generation process of these labels is complex and expensive (relying on 3D semantic annotations), and limited by voxel resolution, they cannot provide fine-grained spatial semantics. To address this limitation, we propose a novel Unifying Occupancy (UniOcc) prediction method, explicitly imposing spatial geometry constraint and complementing fine-grained semantic supervision through volume ray rendering. Our method significantly enhances model performance and demonstrates promising potential in reducing human annotation costs. Given the laborious nature of annotating 3D occupancy, we further introduce a Depth-aware Teacher Student (DTS) framework to enhance prediction accuracy using unlabeled data. Our solution achieves 51.27\% mIoU on the official leaderboard with single model, placing 3rd in this challenge.
翻译:本技术报告提出名为UniOCC的解决方案,用于CVPR 2023 nuScenes开放数据集挑战赛中的视觉中心三维占用预测赛道。现有占用预测方法主要依赖三维占用标签优化三维体空间中的投影特征。然而,这些标签的生成过程复杂且成本高昂(依赖三维语义标注),且受限于体素分辨率而无法提供细粒度空间语义。为解决该局限,我们提出新型统一占用(UniOcc)预测方法,通过体渲染显式施加空间几何约束并补充细粒度语义监督。该方法显著提升模型性能,并展现出降低人工标注成本的潜力。针对三维占用标注的繁重特性,我们进一步引入深度感知师生(DTS)框架,利用无标签数据提升预测精度。本方案以单模型在官方排行榜上达到51.27%的mIoU,位列该挑战赛第三名。