While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric bird's eye view detection methods have inferior performances on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight, to address this issue. In essence, instead of predicting the pixel-wise depth, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. The code is available at {\url{https://github.com/ADLab-AutoDrive/BEVHeight}}.
翻译:尽管当前大多数自动驾驶系统专注于开发基于自车传感器的感知方法,人们往往忽视了另一种利用智能路侧摄像头扩展超视距感知能力的技术路线。我们发现,当前最先进的以视觉为中心的鸟瞰视角检测方法在路侧摄像头上表现不佳,其原因在于这些方法主要聚焦于恢复相对于相机中心的深度信息——当距离增大时,车辆与地面之间的深度差异会迅速减小。为此,本文提出一种简洁而有效的方法,名为BEVHeight。本质上,该方法通过回归地面高度而非逐像素深度,构建了一种与距离无关的建模方式,从而简化纯视觉感知方法的优化过程。在路侧摄像头主流三维检测基准数据集上,本方法显著超越了所有先前以视觉为中心的方案。代码已开源至{\url{https://github.com/ADLab-AutoDrive/BEVHeight}}。