Accurate detection and tracking of surrounding objects is essential to enable self-driving vehicles. While Light Detection and Ranging (LiDAR) sensors have set the benchmark for high performance, the appeal of camera-only solutions lies in their cost-effectiveness. Notably, despite the prevalent use of Radio Detection and Ranging (RADAR) sensors in automotive systems, their potential in 3D detection and tracking has been largely disregarded due to data sparsity and measurement noise. As a recent development, the combination of RADARs and cameras is emerging as a promising solution. This paper presents Camera-RADAR 3D Detection and Tracking (CR3DT), a camera-RADAR fusion model for 3D object detection, and Multi-Object Tracking (MOT). Building upon the foundations of the State-of-the-Art (SotA) camera-only BEVDet architecture, CR3DT demonstrates substantial improvements in both detection and tracking capabilities, by incorporating the spatial and velocity information of the RADAR sensor. Experimental results demonstrate an absolute improvement in detection performance of 5.3% in mean Average Precision (mAP) and a 14.9% increase in Average Multi-Object Tracking Accuracy (AMOTA) on the nuScenes dataset when leveraging both modalities. CR3DT bridges the gap between high-performance and cost-effective perception systems in autonomous driving, by capitalizing on the ubiquitous presence of RADAR in automotive applications.
翻译:精确检测与跟踪周围物体是自动驾驶车辆的关键需求。尽管激光雷达传感器树立了高性能标杆,但纯相机方案的成本效益优势使其备受关注。值得注意的是,尽管雷达传感器在汽车系统中广泛应用,但由于数据稀疏性和测量噪声,其在三维检测与跟踪中的潜力长期被忽视。近年来,雷达与相机的组合正成为一项有前景的新方案。本文提出相机-雷达三维检测与跟踪模型(CR3DT),一种用于三维目标检测与多目标跟踪的相机-雷达融合模型。该模型基于当前最优的纯相机BEVDet架构,通过融合雷达传感器的空间与速度信息,显著提升了检测与跟踪性能。实验结果表明,在nuScenes数据集上,融合双模态后平均精度(mAP)绝对提升5.3%,平均多目标跟踪准确率(AMOTA)提升14.9%。CR3DT通过充分利用汽车应用中雷达的普适性,弥合了自动驾驶中高性能与低成本感知系统之间的鸿沟。