While bird's-eye-view (BEV) perception models can be useful for building high-definition maps (HD-Maps) with less human labor, their results are often unreliable and demonstrate noticeable inconsistencies in the predicted HD-Maps from different viewpoints. This is because BEV perception is typically set up in an 'onboard' manner, which restricts the computation and consequently prevents algorithms from reasoning multiple views simultaneously. This paper overcomes these limitations and advocates a more practical 'offboard' HD-Map generation setup that removes the computation constraints, based on the fact that HD-Maps are commonly reusable infrastructures built offline in data centers. To this end, we propose a novel offboard pipeline called MV-Map that capitalizes multi-view consistency and can handle an arbitrary number of frames with the key design of a 'region-centric' framework. In MV-Map, the target HD-Maps are created by aggregating all the frames of onboard predictions, weighted by the confidence scores assigned by an 'uncertainty network'. To further enhance multi-view consistency, we augment the uncertainty network with the global 3D structure optimized by a voxelized neural radiance field (Voxel-NeRF). Extensive experiments on nuScenes show that our MV-Map significantly improves the quality of HD-Maps, further highlighting the importance of offboard methods for HD-Map generation.
翻译:虽然鸟瞰视角(BEV)感知模型有助于以较少人力构建高精地图(HD-Map),但其结果往往不可靠,且在不同视角预测的高精地图中表现出显著的不一致性。这是因为BEV感知通常采用“车载”方式设置,限制了计算资源,从而阻碍了算法同时处理多视角信息。本文克服了这些局限性,提出一种更实用的“离线”高精地图生成方案——基于高精地图通常作为可复用基础设施在数据中心离线构建的事实,该方案消除了计算约束。为此,我们提出一种名为MV-Map的新型离线流水线,其核心设计是“区域中心”框架,能够利用多视角一致性并处理任意数量的帧。在MV-Map中,目标高精地图通过聚合所有车载预测帧生成,并根据“不确定性网络”分配的置信度分数进行加权。为进一步增强多视角一致性,我们利用体素化神经辐射场(Voxel-NeRF)优化的全局三维结构来增强不确定性网络。在nuScenes数据集上的大量实验表明,我们的MV-Map显著提升了高精地图的质量,进一步凸显了离线方法在高精地图生成中的重要性。