Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks. Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space. However, people discover that this underlying assumption makes the current fusion framework infeasible to produce any prediction when there is a LiDAR malfunction, regardless of minor or major. This fundamentally limits the deployment capability to realistic autonomous driving scenarios. In contrast, we propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data, thus addressing the downside of previous methods. We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings. Under the robustness training settings that simulate various LiDAR malfunctions, our framework significantly surpasses the state-of-the-art methods by 15.7% to 28.9% mAP. To the best of our knowledge, we are the first to handle realistic LiDAR malfunction and can be deployed to realistic scenarios without any post-processing procedure. The code is available at https://github.com/ADLab-AutoDrive/BEVFusion.
翻译:使用相机和 LiDAR 信息已成为 3D 对象检测任务的一个脱facto 标准 。 目前的方法依靠来自 LiDAR 传感器的点云作为调用图像空间特性的查询。 然而,人们发现,这一基本假设使得目前的聚合框架在出现 LiDAR 故障时无法产生任何预测, 不论大小。 这从根本上将部署能力限制在现实自主驾驶情景上。 相反, 我们提议了一个令人惊讶的简单而新颖的聚合框架, 称为 BEVFusion, 其相机流并不依赖于 LiDAR 数据的输入, 从而解决先前方法的下方问题 。 我们从经验上表明, 我们的框架超过了正常培训环境中最先进的方法 。 在模拟各种 LiDAR 故障的稳健训练环境下, 我们的框架大大超过最先进的方法15.7%至28.9% AP。 据我们所知, 我们首先处理现实的LDAR AR 故障, 并且可以在不经过任何后处理程序的情况下被部署到现实的情景中。 我们的代码可以在 https://AVD/Bstotototoal 。