High-definition (HD) map provides abundant and precise static environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. In this paper, we present \textbf{Map} \textbf{TR}ansformer, an end-to-end framework for online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. To speed up convergence, we further introduce auxiliary one-to-many matching and dense supervision. The proposed method well copes with various map elements with arbitrary shapes. It runs at real-time inference speed and achieves state-of-the-art performance on both nuScenes and Argoverse2 datasets. Abundant qualitative results show stable and robust map construction quality in complex and various driving scenes. Code and more demos are available at \url{https://github.com/hustvl/MapTR} for facilitating further studies and applications.
翻译:高精地图(HD map)为自动驾驶系统提供了丰富且精确的驾驶场景静态环境信息,是规划模块不可或缺的基础组成部分。本文提出 \textbf{Map} \textbf{TR}ansformer,一种用于在线矢量化高精地图构建的端到端框架。我们提出了一种统一的置换等价建模方法,即将地图要素建模为一个具有一组等价置换的点集,该方法精确描述了地图要素的形状并稳定了学习过程。我们设计了一种分层查询嵌入方案,以灵活编码结构化地图信息,并执行分层二分匹配用于地图要素学习。为加速收敛,我们进一步引入了辅助的一对多匹配和密集监督。所提方法能很好地处理各种任意形状的地图要素。它在推理时具有实时速度,并在 nuScenes 和 Argoverse2 数据集上均取得了最先进的性能。丰富的定性结果表明,在复杂多变的驾驶场景中,其地图构建质量稳定且鲁棒。为促进进一步的研究与应用,代码及更多演示可在 \url{https://github.com/hustvl/MapTR} 获取。