High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. We present MapTR, a structured end-to-end Transformer for efficient online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, i.e., modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. MapTR achieves the best performance and efficiency with only camera input among existing vectorized map construction approaches on nuScenes dataset. In particular, MapTR-nano runs at real-time inference speed ($25.1$ FPS) on RTX 3090, $8\times$ faster than the existing state-of-the-art camera-based method while achieving $5.0$ higher mAP. Even compared with the existing state-of-the-art multi-modality method, MapTR-nano achieves $0.7$ higher mAP, and MapTR-tiny achieves $13.5$ higher mAP and $3\times$ faster inference speed. Abundant qualitative results show that MapTR maintains stable and robust map construction quality in complex and various driving scenes. MapTR is of great application value in autonomous driving. Code and more demos are available at \url{https://github.com/hustvl/MapTR}.
翻译:高精地图提供了驾驶场景丰富且精确的环境信息,是自动驾驶系统中规划环节的基础且不可或缺的组成部分。我们提出了MapTR,一种用于高效在线矢量化高精地图构建的结构化端到端Transformer。我们提出了一种统一的置换等价建模方法,即将地图元素建模为具有一组等价置换的点集,该方法准确描述了地图元素的形状,并稳定了学习过程。我们设计了一种分层查询嵌入方案,以灵活地编码结构化地图信息,并针对地图元素学习执行分层二分匹配。在nuScenes数据集上,MapTR在仅使用相机输入的现有矢量化地图构建方法中实现了最佳性能与效率。具体而言,MapTR-nano在RTX 3090上以实时推理速度运行(25.1 FPS),相比现有最先进的基于相机的方法快8倍,同时mAP高出5.0。即便与现有最先进的多模态方法相比,MapTR-nano的mAP仍高出0.7,而MapTR-tiny的mAP高出13.5,且推理速度快3倍。大量定性结果表明,MapTR在复杂多样的驾驶场景中能保持稳定且鲁棒的地图构建质量。MapTR在自动驾驶中具有重要的应用价值。代码及更多演示可在\url{https://github.com/hustvl/MapTR}获取。