Despite tremendous advancements in bird's-eye view (BEV) perception, existing models fall short in generating realistic and coherent semantic map layouts, and they fail to account for uncertainties arising from partial sensor information (such as occlusion or limited coverage). In this work, we introduce MapPrior, a novel BEV perception framework that combines a traditional discriminative BEV perception model with a learned generative model for semantic map layouts. Our MapPrior delivers predictions with better accuracy, realism, and uncertainty awareness. We evaluate our model on the large-scale nuScenes benchmark. At the time of submission, MapPrior outperforms the strongest competing method, with significantly improved MMD and ECE scores in camera- and LiDAR-based BEV perception.
翻译:摘要:尽管鸟瞰视角(BEV)感知技术取得了巨大进步,现有模型在生成真实且连贯的语义地图布局方面仍存在不足,且未能应对由部分传感器信息(如遮挡或有限覆盖范围)引起的不确定性。在本工作中,我们提出了MapPrior,一种新颖的BEV感知框架,它将传统判别式BEV感知模型与针对语义地图布局的学习生成模型相结合。我们的MapPrior能够提供更准确、更真实且更具不确定性意识的预测。我们在大规模nuScenes基准上评估了该模型。在提交时,MapPrior在基于摄像头和激光雷达的BEV感知中,以显著改进的MMD和ECE分数超越了最强的竞争方法。