High-Definition (HD) maps are essential for the safety of autonomous driving systems. While existing techniques employ camera images and onboard sensors to generate vectorized high-precision maps, they are constrained by their reliance on single-frame input. This approach limits their stability and performance in complex scenarios such as occlusions, largely due to the absence of temporal information. Moreover, their performance diminishes when applied to broader perception ranges. In this paper, we present StreamMapNet, a novel online mapping pipeline adept at long-sequence temporal modeling of videos. StreamMapNet employs multi-point attention and temporal information which empowers the construction of large-range local HD maps with high stability and further addresses the limitations of existing methods. Furthermore, we critically examine widely used online HD Map construction benchmark and datasets, Argoverse2 and nuScenes, revealing significant bias in the existing evaluation protocols. We propose to resplit the benchmarks according to geographical spans, promoting fair and precise evaluations. Experimental results validate that StreamMapNet significantly outperforms existing methods across all settings while maintaining an online inference speed of $14.2$ FPS.
翻译:高清地图对于自动驾驶系统的安全性至关重要。现有技术虽可利用相机图像与车载传感器生成矢量化高精度地图,但其受限于单帧输入的处理方式。由于缺乏时序信息,该方法在遮挡等复杂场景下的稳定性与性能受限,且在大范围感知场景中性能会显著下降。本文提出StreamMapNet——一种新型在线地图构建框架,擅长对视频进行长时序建模。通过采用多点注意力机制与时序信息,StreamMapNet能够以高稳定性构建大范围局部高清地图,有效弥补现有方法的不足。此外,我们对广泛使用的在线高清地图构建基准与数据集(Argoverse2和nuScenes)进行了严格审视,发现现有评估协议中存在显著偏差。为此,我们提出根据地理跨度重新划分基准,以推动公平精确的评估。实验结果表明,StreamMapNet在所有设置下均显著优于现有方法,同时保持14.2 FPS的在线推理速度。