Bird's-eye-view (BEV) representation is crucial for the perception function in autonomous driving tasks. It is difficult to balance the accuracy, efficiency and range of BEV representation. The existing works are restricted to a limited perception range within 50 meters. Extending the BEV representation range can greatly benefit downstream tasks such as topology reasoning, scene understanding, and planning by offering more comprehensive information and reaction time. The Standard-Definition (SD) navigation maps can provide a lightweight representation of road structure topology, characterized by ease of acquisition and low maintenance costs. An intuitive idea is to combine the close-range visual information from onboard cameras with the beyond line-of-sight (BLOS) environmental priors from SD maps to realize expanded perceptual capabilities. In this paper, we propose BLOS-BEV, a novel BEV segmentation model that incorporates SD maps for accurate beyond line-of-sight perception, up to 200m. Our approach is applicable to common BEV architectures and can achieve excellent results by incorporating information derived from SD maps. We explore various feature fusion schemes to effectively integrate the visual BEV representations and semantic features from the SD map, aiming to leverage the complementary information from both sources optimally. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in BEV segmentation on nuScenes and Argoverse benchmark. Through multi-modal inputs, BEV segmentation is significantly enhanced at close ranges below 50m, while also demonstrating superior performance in long-range scenarios, surpassing other methods by over 20% mIoU at distances ranging from 50-200m.
翻译:鸟瞰图表示对于自动驾驶任务中的感知功能至关重要。在鸟瞰图表示的精度、效率和感知范围之间取得平衡是一项挑战。现有工作通常局限于50米以内的有限感知范围。扩展鸟瞰图的表示范围能够提供更全面的信息和更长的反应时间,从而极大地惠及拓扑推理、场景理解和规划等下游任务。标准清晰度导航地图能够提供轻量级的道路结构拓扑表示,其特点是易于获取且维护成本低。一个直观的想法是将来自车载摄像头的近距视觉信息与来自标准清晰度地图的超视距环境先验相结合,以实现扩展的感知能力。本文提出BLOS-BEV,一种新颖的鸟瞰图分割模型,它融合了标准清晰度地图以实现高达200米的精确超视距感知。我们的方法适用于常见的鸟瞰图架构,并且能够通过融入来自标准清晰度地图的信息取得优异效果。我们探索了多种特征融合方案,以有效整合来自视觉鸟瞰图表示和标准清晰度地图语义特征的信息,旨在最优地利用两者的互补信息。大量实验表明,我们的方法在nuScenes和Argoverse基准测试的鸟瞰图分割任务中达到了最先进的性能。通过多模态输入,鸟瞰图分割在50米以下的近距离场景中得到显著增强,同时在远距离场景中也展现出优越性能,在50-200米距离范围内,其平均交并比超过其他方法20%以上。