High-definition (HD) map provides abundant and precise static environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. In this paper, we present \textbf{Map} \textbf{TR}ansformer, an end-to-end framework for online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. To speed up convergence, we further introduce auxiliary one-to-many matching and dense supervision. The proposed method well copes with various map elements with arbitrary shapes. It runs at real-time inference speed and achieves state-of-the-art performance on both nuScenes and Argoverse2 datasets. Abundant qualitative results show stable and robust map construction quality in complex and various driving scenes. Code and more demos are available at \url{https://github.com/hustvl/MapTR} for facilitating further studies and applications.
翻译:高清地图提供了驾驶场景丰富且精确的静态环境信息,是自动驾驶系统中规划模块不可或缺的基础组件。本文提出Map Transformer(MapTR)——一种用于在线矢量化高清地图构建的端到端框架。我们提出统一的置换等变建模方法,即:将地图元素建模为具有一组等价置换的点集,该方法能精确描述地图元素的形状并稳定学习过程。我们设计分层查询嵌入方案以灵活编码结构化地图信息,并采用分层二分匹配进行地图元素学习。为加速收敛,我们进一步引入辅助的一对多匹配与密集监督机制。所提方法能有效处理任意形状的各类地图元素,具备实时推理速度,并在nuScenes与Argoverse2数据集上均达到最优性能。大量定性结果表明,该方法在复杂多样的驾驶场景中能实现稳定鲁棒的地图构建质量。代码及更多演示可在 \url{https://github.com/hustvl/MapTR} 获取,以促进后续研究与实际应用。