Online mapping and end-to-end (E2E) planning in autonomous driving remain largely sensor-centric, leaving rich map priors, including HD/SD vector maps, rasterized SD maps, and satellite imagery, underused because of heterogeneity, pose drift, and inconsistent availability at test time. We present UMPE, a Unified Map Prior Encoder that can ingest any subset of four priors and fuse them with BEV features for both mapping and planning. UMPE has two branches. The vector encoder pre-aligns HD/SD polylines with a frame-wise SE(2) correction, encodes points via multi-frequency sinusoidal features, and produces polyline tokens with confidence scores. BEV queries then apply cross-attention with confidence bias, followed by normalized channel-wise gating to avoid length imbalance and softly down-weight uncertain sources. The raster encoder shares a ResNet-18 backbone conditioned by FiLM with scaling and shift at every stage, performs SE(2) micro-alignment, and injects priors through zero-initialized residual fusion, so the network starts from a do-no-harm baseline and learns to add only useful prior evidence. A vector-then-raster fusion order reflects the inductive bias of geometry first, appearance second. On nuScenes mapping, UMPE lifts MapTRv2 from 61.5 to 67.4 mAP (+5.9) and MapQR from 66.4 to 71.7 mAP (+5.3). On Argoverse2, UMPE adds +4.1 mAP over strong baselines. UMPE is compositional: when trained with all priors, it outperforms single-prior models even when only one prior is available at test time, demonstrating powerset robustness. For E2E planning with the VAD backbone on nuScenes, UMPE reduces trajectory error from 0.72 to 0.42 m L2 on average (-0.30 m) and collision rate from 0.22% to 0.12% (-0.10%), surpassing recent prior-injection methods. These results show that a unified, alignment-aware treatment of heterogeneous map priors yields better mapping and better planning.
翻译:在线建图与端到端规划在自动驾驶中仍主要依赖于传感器数据,而丰富的地图先验——包括高清/标清矢量地图、栅格化标清地图及卫星影像——由于数据异构性、位姿漂移及测试时可用性不一致而未被充分利用。我们提出UMPE(统一地图先验编码器),该编码器可接收四种先验的任意子集,并将其与BEV特征融合用于建图与规划。UMPE包含两个分支。矢量编码器通过帧级SE(2)校正预对齐高清/标清折线,利用多频正弦特征编码点,并生成带置信度分数的折线令牌。随后BEV查询应用带置信度偏置的交叉注意力,并经归一化通道门控机制处理,以避免长度不均衡并软性降低不可靠来源的权重。栅格编码器共享由FiLM在每阶段进行缩放与平移调控的ResNet-18骨干网络,执行SE(2)微对齐,并通过零初始化残差融合注入先验,使网络从"无害化"基线出发,仅学习增加有效先验证据。先矢量后栅格的融合顺序体现了"几何优先、外观次之"的归纳偏置。在nuScenes建图任务中,UMPE将MapTRv2的mAP从61.5提升至67.4(+5.9),MapQR从66.4提升至71.7(+5.3)。在Argoverse2上,UMPE在强基线基础上额外提升+4.1 mAP。UMPE具有组合能力:当使用全部先验训练时,即使测试时仅有一种先验可用,其性能仍超越单先验模型,展现出幂集鲁棒性。在nuScenes上基于VAD骨干网络的端到端规划中,UMPE将轨迹误差L2从0.72米降至0.42米(平均降低-0.30米),碰撞率从0.22%降至0.12%(-0.10%),超越近期先验注入方法。这些结果表明,对异构地图先验的统一对齐感知处理能够同时改善建图与规划性能。