This technical report presents our solution, "occTransformer" for the 3D occupancy prediction track in the autonomous driving challenge at CVPR 2023. Our method builds upon the strong baseline BEVFormer and improves its performance through several simple yet effective techniques. Firstly, we employed data augmentation to increase the diversity of the training data and improve the model's generalization ability. Secondly, we used a strong image backbone to extract more informative features from the input data. Thirdly, we incorporated a 3D unet head to better capture the spatial information of the scene. Fourthly, we added more loss functions to better optimize the model. Additionally, we used an ensemble approach with the occ model BevDet and SurroundOcc to further improve the performance. Most importantly, we integrated 3D detection model StreamPETR to enhance the model's ability to detect objects in the scene. Using these methods, our solution achieved 49.23 miou on the 3D occupancy prediction track in the autonomous driving challenge.
翻译:本技术报告介绍了我们在CVPR 2023自动驾驶挑战赛3D占用预测赛道中提出的解决方案"occTransformer"。该方法基于强基线模型BEVFormer,通过若干简单而有效的技术提升其性能。首先,我们采用数据增强技术增加训练数据多样性并提升模型泛化能力;其次,使用强图像骨干网络从输入数据中提取更具信息量的特征;第三,引入3D Unet头网络以更好地捕捉场景的空间信息;第四,增加更多损失函数以优化模型训练。此外,我们采用集成策略结合occ模型BevDet和SurroundOcc进一步提升性能。最重要的是,我们集成了3D检测模型StreamPETR以增强场景目标检测能力。通过上述方法,我们的解决方案在自动驾驶挑战赛3D占用预测赛道取得了49.23的mIoU成绩。