In this technical report, we present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ Dataset Challenge at CVPR 2024. Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling. Initially, we independently train the occupancy model, followed by flow prediction using sequential frame integration. Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth. Experimental results on the nuScenes dataset demonstrate significant improvements in accuracy and robustness, showcasing the effectiveness of our approach in real-world scenarios. Our single model based on Swin-Base ranks second on the public leaderboard, validating the potential of our method in advancing autonomous car perception systems.
翻译:在本技术报告中,我们介绍了针对CVPR 2024 nuScenes Open-Occ数据集挑战赛中视觉中心三维占据与流预测赛道的解决方案。我们的创新方法采用双阶段框架,通过引入自适应前向视角变换与流建模来提升三维占据与流预测性能。我们首先独立训练占据模型,随后通过时序帧融合进行流预测。该方法结合回归与分类策略以应对不同场景中的尺度变化,并利用预测的流在真实未来帧的引导下将当前体素特征扭曲至未来帧。在nuScenes数据集上的实验结果表明,该方法在精度与鲁棒性方面均取得显著提升,展现了其在真实场景中的有效性。我们基于Swin-Base架构的单一模型在公开排行榜上位列第二,验证了本方法在推进自动驾驶感知系统发展方面的潜力。