Camera, LiDAR and radar are common perception sensors for autonomous driving tasks. Robust prediction of 3D object detection is optimally based on the fusion of these sensors. To exploit their abilities wisely remains a challenge because each of these sensors has its own characteristics. In this paper, we propose FADet, a multi-sensor 3D detection network, which specifically studies the characteristics of different sensors based on our local featured attention modules. For camera images, we propose dual-attention-based sub-module. For LiDAR point clouds, triple-attention-based sub-module is utilized while mixed-attention-based sub-module is applied for features of radar points. With local featured attention sub-modules, our FADet has effective detection results in long-tail and complex scenes from camera, LiDAR and radar input. On NuScenes validation dataset, FADet achieves state-of-the-art performance on LiDAR-camera object detection tasks with 71.8% NDS and 69.0% mAP, at the same time, on radar-camera object detection tasks with 51.7% NDS and 40.3% mAP. Code will be released at https://github.com/ZionGo6/FADet.
翻译:摄像头、激光雷达和雷达是自动驾驶任务中常见的感知传感器。三维目标检测的鲁棒预测最优基于这些传感器的融合。然而,如何合理利用它们的能力仍是一个挑战,因为每种传感器都有其独特特性。本文提出FADet——一种多传感器三维检测网络,其基于局部特征注意力模块专门研究了不同传感器的特性。对于摄像头图像,我们提出了基于双重注意力的子模块;对于激光雷达点云,采用基于三重注意力的子模块;而对于雷达点特征,则应用基于混合注意力的子模块。借助局部特征注意力子模块,我们的FADet在摄像头、激光雷达和雷达输入的长尾及复杂场景中均取得了有效的检测结果。在NuScenes验证数据集上,FADet在激光雷达-摄像头目标检测任务中实现了71.8%的NDS和69.0%的mAP,同时在雷达-摄像头目标检测任务中达到了51.7%的NDS和40.3%的mAP,均达到当前最优性能。代码将在https://github.com/ZionGo6/FADet 公开发布。