3D object detection at long-range is crucial for ensuring the safety and efficiency of self-driving cars, allowing them to accurately perceive and react to objects, obstacles, and potential hazards from a distance. But most current state-of-the-art LiDAR based methods are limited by the sparsity of range sensors, which generates a form of domain gap between points closer to and farther away from the ego vehicle. Another related problem is the label imbalance for faraway objects, which inhibits the performance of Deep Neural Networks at long-range. Although image features could be beneficial for long-range detections, and some recently proposed multimodal methods incorporate image features, they do not scale well computationally at long ranges or are limited by depth estimation accuracy. To address the above limitations, we propose to combine two LiDAR based 3D detection networks, one specializing at near to mid-range objects, and one at long-range 3D detection. To train a detector at long range under a scarce label regime, we further propose to weigh the loss according to the labelled objects' distance from ego vehicle. To mitigate the LiDAR sparsity issue, we leverage Multimodal Virtual Points (MVP), an image based depth completion algorithm, to enrich our data with virtual points. Our method, combining two range experts trained with MVP, which we refer to as RangeFSD, achieves state-of-the-art performance on the Argoverse2 (AV2) dataset, with improvements at long range. The code will be released soon.
翻译:长距离三维物体检测对于确保自动驾驶汽车的安全性和效率至关重要,使其能够准确感知并响应远处物体、障碍物及潜在危险。然而,当前基于激光雷达的最先进方法大多受限于距离传感器的稀疏性,这导致靠近自车与远离自车的点之间存在一种领域差距。另一个相关问题是远处物体的标签不平衡,这抑制了深度神经网络在长距离上的性能。尽管图像特征可能有益于长距离检测,且近期提出的一些多模态方法也融合了图像特征,但它们在长距离上的计算扩展性不佳,或受限于深度估计精度。为解决上述限制,我们提出结合两个基于激光雷达的三维检测网络:一个专门处理近中程物体,另一个负责长距离三维检测。为了在标签稀缺的情况下训练长距离检测器,我们进一步提出根据标注物体与自车的距离对损失进行加权。为缓解激光雷达的稀疏性问题,我们利用基于图像深度补全算法的多模态虚拟点(MVP)来增强数据,添加虚拟点。我们的方法结合了两个经MVP训练的距离专家,称为RangeFSD,在Argoverse2(AV2)数据集上达到了最先进性能,并在长距离上取得改进。代码即将发布。