A recent sensor fusion in a Bird's Eye View (BEV) space has shown its utility in various tasks such as 3D detection, map segmentation, etc. However, the approach struggles with inaccurate camera BEV estimation, and a perception of distant areas due to the sparsity of LiDAR points. In this paper, we propose a broad BEV fusion (BroadBEV) that addresses the problems with a spatial synchronization approach of cross-modality. Our strategy aims to enhance camera BEV estimation for a broad-sighted perception while simultaneously improving the completion of LiDAR's sparsity in the entire BEV space. Toward that end, we devise Point-scattering that scatters LiDAR BEV distribution to camera depth distribution. The method boosts the learning of depth estimation of the camera branch and induces accurate location of dense camera features in BEV space. For an effective BEV fusion between the spatially synchronized features, we suggest ColFusion that applies self-attention weights of LiDAR and camera BEV features to each other. Our extensive experiments demonstrate that BroadBEV provides a broad-sighted BEV perception with remarkable performance gains.
翻译:近期,鸟瞰图空间中的传感器融合技术已在三维检测、地图分割等任务中展现出其应用价值。然而,该方法仍面临相机鸟瞰图估计不准确以及因激光雷达点云稀疏导致远处区域感知困难的问题。本文提出一种广域鸟瞰融合方法,通过跨模态空间同步策略解决上述问题。该方法旨在增强相机鸟瞰图估计以实现广域感知,同时改善整个鸟瞰空间中激光雷达稀疏点的补全效果。为此,我们设计了点散射模块,将激光雷达鸟瞰分布映射至相机深度分布,从而提升相机分支深度估计学习能力,并实现密集相机特征在鸟瞰空间中的精确定位。针对空间同步特征的高效融合,我们提出列融合方法,通过相互施加激光雷达与相机鸟瞰特征的自注意力权重实现融合。大量实验表明,BroadBEV能以显著的性能提升实现广域鸟瞰感知。