A semantic map of the road scene, covering fundamental road elements, is an essential ingredient in autonomous driving systems. It provides important perception foundations for positioning and planning when rendered in the Bird's-Eye-View (BEV). Currently, the prior knowledge of hypothetical depth can guide the learning of translating front perspective views into BEV directly with the help of calibration parameters. However, it suffers from geometric distortions in the representation of distant objects. In addition, another stream of methods without prior knowledge can learn the transformation between front perspective views and BEV implicitly with a global view. Considering that the fusion of different learning methods may bring surprising beneficial effects, we propose a Bi-Mapper framework for top-down road-scene semantic understanding, which incorporates a global view and local prior knowledge. To enhance reliable interaction between them, an asynchronous mutual learning strategy is proposed. At the same time, an Across-Space Loss (ASL) is designed to mitigate the negative impact of geometric distortions. Extensive results on nuScenes and Cam2BEV datasets verify the consistent effectiveness of each module in the proposed Bi-Mapper framework. Compared with exiting road mapping networks, the proposed Bi-Mapper achieves 5.0 higher IoU on the nuScenes dataset. Moreover, we verify the generalization performance of Bi-Mapper in a real-world driving scenario. Code will be available at https://github.com/lynn-yu/Bi-Mapper.
翻译:道路场景的语义地图,涵盖基本道路元素,是自动驾驶系统中的关键组成部分。当以鸟瞰视角呈现时,它为定位和规划提供了重要的感知基础。目前,基于假设深度先验知识,借助标定参数可直接学习前视角到BEV的映射,但在表示远处物体时存在几何失真问题。此外,另一类无先验知识的方法可通过全局视图隐式学习前视角与BEV之间的变换。考虑到不同学习方法的融合可能带来显著的协同效应,本文提出Bi-Mapper框架用于自上而下的道路场景语义理解。该框架融合了全局视角与局部先验知识。为增强二者间的可靠交互,提出了一种异步互学习策略。同时,设计了一种跨空间损失函数以缓解几何失真的负面影响。在nuScenes和Cam2BEV数据集上的大量实验验证了Bi-Mapper框架各模块的持续有效性。与现有道路建图网络相比,所提出的Bi-Mapper在nuScenes数据集上实现了IoU提升5.0。此外,本文在真实驾驶场景中验证了Bi-Mapper的泛化性能。代码将开源至https://github.com/lynn-yu/Bi-Mapper。