Vision-centric Bird's-Eye View (BEV) representation is essential for autonomous driving systems (ADS). Multi-frame temporal fusion which leverages historical information has been demonstrated to provide more comprehensive perception results. While most research focuses on ego-centric maps of fixed settings, long-range local map generation remains less explored. This work outlines a new paradigm, named NeMO, for generating local maps through the utilization of a readable and writable big map, a learning-based fusion module, and an interaction mechanism between the two. With an assumption that the feature distribution of all BEV grids follows an identical pattern, we adopt a shared-weight neural network for all grids to update the big map. This paradigm supports the fusion of longer time series and the generation of long-range BEV local maps. Furthermore, we release BDD-Map, a BDD100K-based dataset incorporating map element annotations, including lane lines, boundaries, and pedestrian crossing. Experiments on the NuScenes and BDD-Map datasets demonstrate that NeMO outperforms state-of-the-art map segmentation methods. We also provide a new scene-level BEV map evaluation setting along with the corresponding baseline for a more comprehensive comparison.
翻译:以视觉为中心的鸟瞰视角(BEV)表征是自动驾驶系统(ADS)的核心技术。利用历史信息的多帧时序融合已被证明能够提供更全面的感知结果。当前研究多聚焦于固定设置的自我中心地图,而对长距离局部地图生成的探索仍显不足。本文提出一种名为NeMO的新范式,通过利用可读写的大地图、基于学习的融合模块以及两者间的交互机制生成局部地图。基于所有BEV网格的特征分布遵循相同模式的假设,我们采用共享权重神经网络对所有网格进行大地图更新。该范式支持更长时序的融合与长距离BEV局部地图的生成。此外,我们发布了BDD-Map数据集——基于BDD100K并包含车道线、边界线及人行横道等地图元素标注的增强数据集。在NuScenes与BDD-Map数据集上的实验表明,NeMO优于现有最先进的语义分割方法。我们还为更全面的对比提供了新的场景级BEV地图评估配置与相应基线。