Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks

In this paper, we introduce Mask2Map, a novel end-to-end online HD map construction method designed for autonomous driving applications. Our approach focuses on predicting the class and ordered point set of map instances within a scene, represented in the bird's eye view (BEV). Mask2Map consists of two primary components: the Instance-Level Mask Prediction Network (IMPNet) and the Mask-Driven Map Prediction Network (MMPNet). IMPNet generates Mask-Aware Queries and BEV Segmentation Masks to capture comprehensive semantic information globally. Subsequently, MMPNet enhances these query features using local contextual information through two submodules: the Positional Query Generator (PQG) and the Geometric Feature Extractor (GFE). PQG extracts instance-level positional queries by embedding BEV positional information into Mask-Aware Queries, while GFE utilizes BEV Segmentation Masks to generate point-level geometric features. However, we observed limited performance in Mask2Map due to inter-network inconsistency stemming from different predictions to Ground Truth (GT) matching between IMPNet and MMPNet. To tackle this challenge, we propose the Inter-network Denoising Training method, which guides the model to denoise the output affected by both noisy GT queries and perturbed GT Segmentation Masks. Our evaluation conducted on nuScenes and Argoverse2 benchmarks demonstrates that Mask2Map achieves remarkable performance improvements over previous state-of-the-art methods, with gains of 10.1% mAP and 4.1 mAP, respectively. Our code can be found at https://github.com/SehwanChoi0307/Mask2Map.

翻译：本文提出Mask2Map，一种专为自动驾驶应用设计的新型端到端在线高精地图构建方法。我们的方法侧重于预测场景中地图实例的类别及有序点集，并以鸟瞰图形式表示。Mask2Map包含两个核心组件：实例级掩码预测网络与掩码驱动地图预测网络。实例级掩码预测网络通过生成掩码感知查询和鸟瞰图分割掩码来全局捕获完整的语义信息。随后，掩码驱动地图预测网络通过两个子模块——位置查询生成器与几何特征提取器——利用局部上下文信息增强这些查询特征。位置查询生成器通过将鸟瞰图位置信息嵌入掩码感知查询来提取实例级位置查询，而几何特征提取器则利用鸟瞰图分割掩码生成点级几何特征。然而，我们观察到由于实例级掩码预测网络与掩码驱动地图预测网络在预测结果与真值匹配方面存在网络间不一致性，导致Mask2Map性能受限。为解决这一问题，我们提出跨网络去噪训练方法，引导模型对受噪声真值查询和扰动真值分割掩码影响的输出进行去噪。在nuScenes和Argoverse2基准测试上的评估表明，Mask2Map相较先前最先进方法分别实现了10.1% mAP和4.1 mAP的显著性能提升。代码开源地址：https://github.com/SehwanChoi0307/Mask2Map。