RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems

In autonomous driving, LiDAR and radar play important roles in the perception of the surrounding environment. LiDAR provides accurate 3D spatial sensing information but cannot work in adverse weather like fog. On the other hand, the radar signal can be diffracted when encountering raindrops or mist particles thanks to its wavelength, but it suffers from large noise. Recent state-of-the-art works reveal that fusion of radar and LiDAR can lead to robust detection in adverse weather. The existing works adopt convolutional neural network architecture to extract features from each sensor data, then align and aggregate the two branch features to predict object detection results. However, these methods have low accuracy of bounding box estimations due to a simple design of label assignment and fusion strategies. In this paper, we propose a bird's-eye view fusion learning-based anchor box-free object detection system, which fuses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate possible objects. Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector is further enhanced by employing a novel interactive transformer module. The superior performance of the methods proposed in this paper has been demonstrated using the recently published Oxford Radar RobotCar dataset. Our system's average precision significantly outperforms the state-of-the-art method by 13.1% and 19.0% at IoU of 0.8 under 'Clear+Foggy' training conditions for 'Clear' and 'Foggy' testing, respectively.

翻译：在自动驾驶中，激光雷达和雷达在周围环境感知方面发挥着重要作用。激光雷达能提供精确的三维空间感知信息，但在雾等恶劣天气下无法正常工作。相反，雷达信号因其波长特性可在遇到雨滴或雾粒时发生衍射，但噪声较大。最新研究表明，雷达与激光雷达的融合可在恶劣天气下实现鲁棒检测。现有方法采用卷积神经网络架构从各传感器数据中提取特征，随后对齐并聚合两分支特征以预测目标检测结果。然而，由于标签分配和融合策略设计简单，这些方法的边界框估计精度较低。本文提出一种基于鸟瞰图融合学习的无锚框目标检测系统，该系统融合来自雷达距离-方位角热图和激光雷达点云的特征，以估计可能的目标。我们设计了不同的标签分配策略，以促进前景或背景锚点分类与相应边界框回归之间的一致性。此外，通过引入新型交互式Transformer模块，进一步提升了所提目标检测器的性能。利用近期发布的Oxford Radar RobotCar数据集，验证了本文方法的优越性能。在'Clear+Foggy'训练条件下，针对'Clear'和'Foggy'测试集，当IoU为0.8时，我们系统的平均精度分别显著超越现有最优方法13.1%和19.0%。