While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric bird's eye view detection methods have inferior performances on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight, to address this issue. In essence, instead of predicting the pixel-wise depth, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. The code is available at {\url{https://github.com/ADLab-AutoDrive/BEVHeight}}.
翻译:当前大多数自动驾驶系统专注于开发基于自车传感器的感知方法,人们往往忽视了利用智能路侧摄像头来扩展视觉范围之外的感知能力这一替代方案。我们发现,当前最先进的以视觉为中心(vision-centric)的鸟瞰图检测方法在路侧摄像头上的表现较差,其原因在于这些方法主要侧重于恢复相对于相机中心的深度信息,而随着距离增加,车辆与地面之间的深度差会迅速缩小。本文提出一种名为BEVHeight的简洁而有效的方法来解决该问题。本质上,我们通过回归地面高度而非像素级深度,构建一种距离无关的公式,从而简化纯视觉感知方法的优化过程。在路侧摄像头的3D检测主流基准测试中,我们的方法以显著优势超越了所有先前以视觉为中心的方法。代码已开源在https://github.com/ADLab-AutoDrive/BEVHeight。